2023-10-09 10:20:55,470 INFO [train.py:1099] (2/4) Training started 2023-10-09 10:20:55,470 INFO [train.py:1109] (2/4) Device: cuda:2 2023-10-09 10:20:55,473 INFO [train.py:1121] (2/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '821ebc378e7fb99b8adc81950227963332821e01', 'k2-git-date': 'Wed Jul 19 15:38:25 2023', 'lhotse-version': '1.16.0.dev+git.1db4d97a.clean', 'torch-version': '1.11.0+cu102', 'torch-cuda-available': True, 'torch-cuda-version': '10.2', 'python-version': '3.9', 'icefall-git-branch': 'dev_multi_zh-hans', 'icefall-git-sha1': '919793d-dirty', 'icefall-git-date': 'Thu Sep 7 21:06:37 2023', 'icefall-path': '/star-home/jinzengrui/lib/miniconda3/envs/dev39/lib/python3.9/site-packages/icefall-1.0-py3.9.egg', 'k2-path': '/star-home/jinzengrui/lib/miniconda3/envs/dev39/lib/python3.9/site-packages/k2-1.24.3.dev20230721+cuda10.2.torch1.11.0-py3.9-linux-x86_64.egg/k2/__init__.py', 'lhotse-path': '/star-home/jinzengrui/lib/miniconda3/envs/dev39/lib/python3.9/site-packages/lhotse-1.16.0.dev0+git.1db4d97a.clean-py3.9.egg/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-1-1220091118-57c4d55446-mvd6x', 'IP address': '10.177.22.19'}, 'world_size': 4, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 30, 'start_epoch': 14, 'start_batch': 0, 'exp_dir': PosixPath('zipformer/exp-w-ctc'), 'bpe_model': 'data/lang_bpe_2000/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'ctc_loss_scale': 0.2, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': True, 'use_ctc': True, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 700, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'blank_id': 0, 'vocab_size': 2000} 2023-10-09 10:20:55,474 INFO [train.py:1123] (2/4) About to create model 2023-10-09 10:20:56,063 INFO [train.py:1127] (2/4) Number of model parameters: 69651511 2023-10-09 10:20:56,064 INFO [checkpoint.py:112] (2/4) Loading checkpoint from zipformer/exp-w-ctc/epoch-13.pt 2023-10-09 10:21:05,037 INFO [train.py:1142] (2/4) Using DDP 2023-10-09 10:21:05,310 INFO [train.py:1154] (2/4) Loading optimizer state dict 2023-10-09 10:21:06,638 INFO [train.py:1162] (2/4) Loading scheduler state dict 2023-10-09 10:21:06,638 INFO [multi_dataset.py:52] (2/4) About to get multidataset train cuts 2023-10-09 10:21:06,638 INFO [multi_dataset.py:55] (2/4) Loading THCHS-30 in lazy mode 2023-10-09 10:21:06,680 INFO [multi_dataset.py:61] (2/4) Loading Aishell-1 in lazy mode 2023-10-09 10:21:06,683 INFO [multi_dataset.py:67] (2/4) Loading Aishell-2 in lazy mode 2023-10-09 10:21:06,686 INFO [multi_dataset.py:73] (2/4) Loading Aishell-4 in lazy mode 2023-10-09 10:21:06,690 INFO [multi_dataset.py:85] (2/4) Loading ST-CMDS in lazy mode 2023-10-09 10:21:06,691 INFO [multi_dataset.py:89] (2/4) Loading Primewords in lazy mode 2023-10-09 10:21:06,692 INFO [multi_dataset.py:95] (2/4) Loading MagicData in lazy mode 2023-10-09 10:21:06,693 INFO [multi_dataset.py:101] (2/4) Loading Aidatatang_200zh in lazy mode 2023-10-09 10:21:06,694 INFO [multi_dataset.py:107] (2/4) Loading Ali-Meeting in lazy mode 2023-10-09 10:21:06,695 INFO [multi_dataset.py:113] (2/4) Loading WeNetSpeech in lazy mode 2023-10-09 10:21:06,696 INFO [multi_dataset.py:119] (2/4) Loading KeSpeech in lazy mode 2023-10-09 10:22:54,348 INFO [asr_datamodule.py:218] (2/4) Enable MUSAN 2023-10-09 10:22:54,349 INFO [asr_datamodule.py:219] (2/4) About to get Musan cuts 2023-10-09 10:22:56,649 INFO [asr_datamodule.py:243] (2/4) Enable SpecAugment 2023-10-09 10:22:56,649 INFO [asr_datamodule.py:244] (2/4) Time warp factor: 80 2023-10-09 10:22:56,649 INFO [asr_datamodule.py:254] (2/4) Num frame mask: 10 2023-10-09 10:22:56,649 INFO [asr_datamodule.py:267] (2/4) About to create train dataset 2023-10-09 10:22:56,650 INFO [asr_datamodule.py:294] (2/4) Using DynamicBucketingSampler. 2023-10-09 10:23:00,019 INFO [asr_datamodule.py:309] (2/4) About to create train dataloader 2023-10-09 10:23:00,020 INFO [multi_dataset.py:161] (2/4) About to get multidataset dev cuts 2023-10-09 10:23:00,020 INFO [multi_dataset.py:164] (2/4) Loading Aidatatang_200zh DEV set in lazy mode 2023-10-09 10:23:00,022 INFO [multi_dataset.py:170] (2/4) Loading Aishell DEV set in lazy mode 2023-10-09 10:23:00,023 INFO [multi_dataset.py:176] (2/4) Loading Aishell-2 DEV set in lazy mode 2023-10-09 10:23:00,024 INFO [multi_dataset.py:182] (2/4) Loading Ali-Meeting DEV set in lazy mode 2023-10-09 10:23:00,025 INFO [multi_dataset.py:188] (2/4) Loading MagicData DEV set in lazy mode 2023-10-09 10:23:00,026 INFO [multi_dataset.py:194] (2/4) Loading KeSpeech DEV set in lazy mode 2023-10-09 10:23:00,028 INFO [multi_dataset.py:203] (2/4) Loading WeNetSpeech DEV set in lazy mode 2023-10-09 10:23:00,029 INFO [asr_datamodule.py:340] (2/4) About to create dev dataset 2023-10-09 10:23:00,509 INFO [asr_datamodule.py:357] (2/4) About to create dev dataloader 2023-10-09 10:23:00,509 INFO [train.py:1243] (2/4) Loading grad scaler state dict 2023-10-09 10:23:19,456 INFO [train.py:1031] (2/4) Epoch 14, batch 0, loss[loss=0.2201, simple_loss=0.2744, pruned_loss=0.06218, ctc_loss=0.1035, over 16733.00 frames. ], tot_loss[loss=0.2201, simple_loss=0.2744, pruned_loss=0.06218, ctc_loss=0.1035, over 16733.00 frames. ], batch size: 95, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:23:19,456 INFO [train.py:1054] (2/4) Computing validation loss 2023-10-09 10:23:33,188 INFO [train.py:1063] (2/4) Epoch 14, validation: loss=0.2325, simple_loss=0.3081, pruned_loss=0.06029, ctc_loss=0.09091, over 1796401.00 frames. 2023-10-09 10:23:33,189 INFO [train.py:1064] (2/4) Maximum memory allocated so far is 13033MB 2023-10-09 10:23:48,868 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.610e+02 3.348e+02 4.018e+02 4.917e+02 9.056e+02, threshold=8.035e+02, percent-clipped=7.0 2023-10-09 10:24:05,927 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2728842.6666666665, ans=0.125 2023-10-09 10:24:13,806 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2728889.3333333335, ans=0.2 2023-10-09 10:24:33,525 INFO [train.py:1031] (2/4) Epoch 14, batch 50, loss[loss=0.2901, simple_loss=0.3456, pruned_loss=0.08654, ctc_loss=0.1538, over 16613.00 frames. ], tot_loss[loss=0.2311, simple_loss=0.2877, pruned_loss=0.06482, ctc_loss=0.1122, over 750709.83 frames. ], batch size: 351, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:24:39,626 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=2728982.6666666665, ans=22.5 2023-10-09 10:24:53,476 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2729029.3333333335, ans=0.1 2023-10-09 10:25:20,616 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.10 vs. limit=15.0 2023-10-09 10:25:24,351 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2729169.3333333335, ans=0.1 2023-10-09 10:25:34,390 INFO [train.py:1031] (2/4) Epoch 14, batch 100, loss[loss=0.248, simple_loss=0.3232, pruned_loss=0.06275, ctc_loss=0.1182, over 16187.00 frames. ], tot_loss[loss=0.2393, simple_loss=0.2987, pruned_loss=0.06664, ctc_loss=0.1165, over 1305889.88 frames. ], batch size: 463, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:25:41,642 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2729216.0, ans=0.2 2023-10-09 10:25:49,380 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.490e+02 3.274e+02 3.784e+02 4.390e+02 8.009e+02, threshold=7.568e+02, percent-clipped=0.0 2023-10-09 10:25:54,002 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2729262.6666666665, ans=0.0 2023-10-09 10:26:19,008 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.78 vs. limit=15.0 2023-10-09 10:26:30,560 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2729402.6666666665, ans=0.125 2023-10-09 10:26:34,425 INFO [train.py:1031] (2/4) Epoch 14, batch 150, loss[loss=0.2612, simple_loss=0.3314, pruned_loss=0.07066, ctc_loss=0.1242, over 16900.00 frames. ], tot_loss[loss=0.2486, simple_loss=0.3124, pruned_loss=0.06826, ctc_loss=0.1205, over 1750383.28 frames. ], batch size: 292, lr: 2.60e-03, grad_scale: 1.0 2023-10-09 10:26:35,767 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2729449.3333333335, ans=0.0 2023-10-09 10:26:40,310 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2729449.3333333335, ans=0.125 2023-10-09 10:26:41,292 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2729449.3333333335, ans=0.125 2023-10-09 10:27:24,758 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2729636.0, ans=0.0 2023-10-09 10:27:31,184 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2729636.0, ans=0.025 2023-10-09 10:27:36,036 INFO [train.py:1031] (2/4) Epoch 14, batch 200, loss[loss=0.2095, simple_loss=0.2769, pruned_loss=0.05281, ctc_loss=0.09132, over 16651.00 frames. ], tot_loss[loss=0.247, simple_loss=0.3115, pruned_loss=0.06742, ctc_loss=0.119, over 2092075.71 frames. ], batch size: 151, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:27:36,403 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2729682.6666666665, ans=0.2 2023-10-09 10:27:41,144 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.84 vs. limit=15.0 2023-10-09 10:27:54,276 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.502e+02 3.044e+02 3.577e+02 4.251e+02 7.739e+02, threshold=7.154e+02, percent-clipped=1.0 2023-10-09 10:27:59,459 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2729729.3333333335, ans=0.2 2023-10-09 10:27:59,516 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2729729.3333333335, ans=0.0 2023-10-09 10:28:26,664 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2729869.3333333335, ans=0.0 2023-10-09 10:28:27,125 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.29 vs. limit=15.0 2023-10-09 10:28:35,918 INFO [train.py:1031] (2/4) Epoch 14, batch 250, loss[loss=0.2876, simple_loss=0.3542, pruned_loss=0.08024, ctc_loss=0.1512, over 16795.00 frames. ], tot_loss[loss=0.2444, simple_loss=0.3098, pruned_loss=0.06613, ctc_loss=0.1171, over 2353974.88 frames. ], batch size: 328, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:28:37,859 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2729916.0, ans=0.1 2023-10-09 10:28:41,907 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2729916.0, ans=0.125 2023-10-09 10:28:46,371 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2729916.0, ans=0.125 2023-10-09 10:29:13,738 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2730056.0, ans=0.1 2023-10-09 10:29:19,975 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.86 vs. limit=15.0 2023-10-09 10:29:24,325 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2730056.0, ans=0.09899494936611666 2023-10-09 10:29:37,208 INFO [train.py:1031] (2/4) Epoch 14, batch 300, loss[loss=0.2223, simple_loss=0.2817, pruned_loss=0.05997, ctc_loss=0.1073, over 16746.00 frames. ], tot_loss[loss=0.2392, simple_loss=0.3049, pruned_loss=0.06404, ctc_loss=0.1136, over 2553471.73 frames. ], batch size: 140, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:29:55,809 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2730196.0, ans=0.125 2023-10-09 10:29:56,420 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.367e+02 3.126e+02 3.650e+02 4.282e+02 7.513e+02, threshold=7.299e+02, percent-clipped=1.0 2023-10-09 10:30:38,037 INFO [train.py:1031] (2/4) Epoch 14, batch 350, loss[loss=0.2393, simple_loss=0.2753, pruned_loss=0.07454, ctc_loss=0.1356, over 15273.00 frames. ], tot_loss[loss=0.2422, simple_loss=0.3043, pruned_loss=0.06659, ctc_loss=0.1175, over 2717857.78 frames. ], batch size: 527, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:30:45,190 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2730382.6666666665, ans=0.0 2023-10-09 10:30:49,573 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2730429.3333333335, ans=0.125 2023-10-09 10:31:02,236 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=2730476.0, ans=10.0 2023-10-09 10:31:04,999 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2730476.0, ans=0.125 2023-10-09 10:31:23,678 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2730522.6666666665, ans=0.125 2023-10-09 10:31:30,393 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2730569.3333333335, ans=0.125 2023-10-09 10:31:33,640 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2730569.3333333335, ans=0.125 2023-10-09 10:31:33,705 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2730569.3333333335, ans=0.125 2023-10-09 10:31:37,395 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2730616.0, ans=0.125 2023-10-09 10:31:38,085 INFO [train.py:1031] (2/4) Epoch 14, batch 400, loss[loss=0.2433, simple_loss=0.3058, pruned_loss=0.06842, ctc_loss=0.1102, over 16987.00 frames. ], tot_loss[loss=0.2406, simple_loss=0.2991, pruned_loss=0.06738, ctc_loss=0.1183, over 2859669.35 frames. ], batch size: 86, lr: 2.60e-03, grad_scale: 8.0 2023-10-09 10:31:42,708 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.91 vs. limit=15.0 2023-10-09 10:31:46,077 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2730616.0, ans=0.09899494936611666 2023-10-09 10:31:46,079 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 10:31:57,468 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.680e+02 3.285e+02 3.968e+02 4.685e+02 8.332e+02, threshold=7.936e+02, percent-clipped=1.0 2023-10-09 10:32:05,945 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2730709.3333333335, ans=0.0 2023-10-09 10:32:13,197 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.70 vs. limit=12.0 2023-10-09 10:32:13,994 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2730756.0, ans=0.125 2023-10-09 10:32:30,193 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2730802.6666666665, ans=0.125 2023-10-09 10:32:35,321 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.10 vs. limit=10.0 2023-10-09 10:32:39,447 INFO [train.py:1031] (2/4) Epoch 14, batch 450, loss[loss=0.2378, simple_loss=0.3327, pruned_loss=0.05198, ctc_loss=0.09749, over 16328.00 frames. ], tot_loss[loss=0.2388, simple_loss=0.298, pruned_loss=0.06648, ctc_loss=0.1169, over 2961996.99 frames. ], batch size: 463, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:32:48,991 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2730849.3333333335, ans=0.0 2023-10-09 10:32:58,788 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=2730896.0, ans=15.0 2023-10-09 10:33:21,264 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2730989.3333333335, ans=0.125 2023-10-09 10:33:21,552 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.40 vs. limit=15.0 2023-10-09 10:33:27,544 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2023-10-09 10:33:40,961 INFO [train.py:1031] (2/4) Epoch 14, batch 500, loss[loss=0.2363, simple_loss=0.2705, pruned_loss=0.07571, ctc_loss=0.1266, over 16720.00 frames. ], tot_loss[loss=0.233, simple_loss=0.2917, pruned_loss=0.06446, ctc_loss=0.1134, over 3040677.17 frames. ], batch size: 328, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:34:00,456 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.430e+02 3.135e+02 3.674e+02 4.514e+02 8.848e+02, threshold=7.348e+02, percent-clipped=4.0 2023-10-09 10:34:07,956 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=2731176.0, ans=0.95 2023-10-09 10:34:17,127 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2731222.6666666665, ans=0.0 2023-10-09 10:34:22,669 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 10:34:40,421 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2731316.0, ans=0.125 2023-10-09 10:34:41,181 INFO [train.py:1031] (2/4) Epoch 14, batch 550, loss[loss=0.2064, simple_loss=0.2573, pruned_loss=0.05763, ctc_loss=0.1005, over 16811.00 frames. ], tot_loss[loss=0.2273, simple_loss=0.2828, pruned_loss=0.06359, ctc_loss=0.1115, over 3094632.49 frames. ], batch size: 243, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:34:54,628 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2731362.6666666665, ans=0.125 2023-10-09 10:34:58,847 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2731362.6666666665, ans=0.125 2023-10-09 10:35:24,695 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2731456.0, ans=0.1 2023-10-09 10:35:24,731 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=2731456.0, ans=0.02 2023-10-09 10:35:24,767 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=2731456.0, ans=0.1 2023-10-09 10:35:42,188 INFO [train.py:1031] (2/4) Epoch 14, batch 600, loss[loss=0.1981, simple_loss=0.2445, pruned_loss=0.05594, ctc_loss=0.09929, over 16817.00 frames. ], tot_loss[loss=0.2238, simple_loss=0.2774, pruned_loss=0.06302, ctc_loss=0.1105, over 3131551.93 frames. ], batch size: 189, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:36:01,126 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2731596.0, ans=0.125 2023-10-09 10:36:02,848 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.272e+02 3.049e+02 3.429e+02 4.091e+02 7.448e+02, threshold=6.859e+02, percent-clipped=1.0 2023-10-09 10:36:18,499 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2731689.3333333335, ans=0.125 2023-10-09 10:36:28,796 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2731689.3333333335, ans=0.0 2023-10-09 10:36:37,971 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2731736.0, ans=0.0 2023-10-09 10:36:43,666 INFO [train.py:1031] (2/4) Epoch 14, batch 650, loss[loss=0.2014, simple_loss=0.2549, pruned_loss=0.05511, ctc_loss=0.09419, over 16778.00 frames. ], tot_loss[loss=0.2203, simple_loss=0.2722, pruned_loss=0.06237, ctc_loss=0.1093, over 3173553.06 frames. ], batch size: 215, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:36:46,058 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2731782.6666666665, ans=0.1 2023-10-09 10:37:02,452 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2731829.3333333335, ans=0.125 2023-10-09 10:37:18,874 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2731922.6666666665, ans=0.0 2023-10-09 10:37:19,215 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.69 vs. limit=12.0 2023-10-09 10:37:36,637 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.12 vs. limit=15.0 2023-10-09 10:37:43,522 INFO [train.py:1031] (2/4) Epoch 14, batch 700, loss[loss=0.1828, simple_loss=0.2647, pruned_loss=0.0375, ctc_loss=0.06486, over 16808.00 frames. ], tot_loss[loss=0.215, simple_loss=0.2692, pruned_loss=0.05952, ctc_loss=0.1044, over 3197929.44 frames. ], batch size: 188, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:37:57,521 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2732062.6666666665, ans=0.1 2023-10-09 10:38:01,503 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=17.65 vs. limit=15.0 2023-10-09 10:38:05,622 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2732062.6666666665, ans=0.1 2023-10-09 10:38:06,355 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+02 2.869e+02 3.199e+02 3.835e+02 8.884e+02, threshold=6.398e+02, percent-clipped=1.0 2023-10-09 10:38:06,799 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2732109.3333333335, ans=0.2 2023-10-09 10:38:07,808 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2732109.3333333335, ans=0.125 2023-10-09 10:38:12,291 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2732109.3333333335, ans=0.1 2023-10-09 10:38:12,295 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2732109.3333333335, ans=0.2 2023-10-09 10:38:19,331 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2732156.0, ans=0.0 2023-10-09 10:38:24,952 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2732156.0, ans=0.0 2023-10-09 10:38:37,793 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2732202.6666666665, ans=0.1 2023-10-09 10:38:44,812 INFO [train.py:1031] (2/4) Epoch 14, batch 750, loss[loss=0.2202, simple_loss=0.273, pruned_loss=0.06318, ctc_loss=0.1026, over 16601.00 frames. ], tot_loss[loss=0.218, simple_loss=0.2776, pruned_loss=0.05843, ctc_loss=0.1038, over 3200510.72 frames. ], batch size: 110, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:38:47,313 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2732249.3333333335, ans=0.125 2023-10-09 10:38:56,232 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2732249.3333333335, ans=0.125 2023-10-09 10:39:04,957 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2732296.0, ans=0.125 2023-10-09 10:39:31,961 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2732389.3333333335, ans=0.0 2023-10-09 10:39:37,046 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2732436.0, ans=0.0 2023-10-09 10:39:38,146 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2732436.0, ans=0.09899494936611666 2023-10-09 10:39:41,352 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2732436.0, ans=0.0 2023-10-09 10:39:44,025 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.60 vs. limit=12.0 2023-10-09 10:39:48,673 INFO [train.py:1031] (2/4) Epoch 14, batch 800, loss[loss=0.2444, simple_loss=0.3144, pruned_loss=0.06384, ctc_loss=0.1167, over 16800.00 frames. ], tot_loss[loss=0.2293, simple_loss=0.2923, pruned_loss=0.06132, ctc_loss=0.1093, over 3222998.23 frames. ], batch size: 176, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:39:50,981 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.69 vs. limit=15.0 2023-10-09 10:39:54,011 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2732482.6666666665, ans=0.125 2023-10-09 10:39:55,598 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.91 vs. limit=22.5 2023-10-09 10:40:12,733 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.229e+02 3.375e+02 4.289e+02 5.326e+02 8.856e+02, threshold=8.578e+02, percent-clipped=11.0 2023-10-09 10:40:13,202 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 10:40:50,024 INFO [train.py:1031] (2/4) Epoch 14, batch 850, loss[loss=0.2281, simple_loss=0.3378, pruned_loss=0.04385, ctc_loss=0.07671, over 15047.00 frames. ], tot_loss[loss=0.2322, simple_loss=0.2986, pruned_loss=0.06108, ctc_loss=0.1092, over 3230800.45 frames. ], batch size: 526, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:41:06,096 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.57 vs. limit=15.0 2023-10-09 10:41:07,765 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2732762.6666666665, ans=0.125 2023-10-09 10:41:38,618 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2732902.6666666665, ans=0.1 2023-10-09 10:41:49,387 INFO [train.py:1031] (2/4) Epoch 14, batch 900, loss[loss=0.2638, simple_loss=0.3033, pruned_loss=0.08313, ctc_loss=0.145, over 16631.00 frames. ], tot_loss[loss=0.2323, simple_loss=0.2976, pruned_loss=0.06159, ctc_loss=0.1094, over 3237451.49 frames. ], batch size: 351, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:41:59,821 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2732949.3333333335, ans=0.0 2023-10-09 10:42:02,826 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.20 vs. limit=15.0 2023-10-09 10:42:08,918 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2732996.0, ans=0.125 2023-10-09 10:42:12,342 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2732996.0, ans=0.1 2023-10-09 10:42:17,044 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.380e+02 3.289e+02 4.047e+02 4.926e+02 9.646e+02, threshold=8.093e+02, percent-clipped=3.0 2023-10-09 10:42:28,529 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2733089.3333333335, ans=0.0 2023-10-09 10:42:31,319 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2733089.3333333335, ans=0.0 2023-10-09 10:42:35,781 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.73 vs. limit=15.0 2023-10-09 10:42:45,586 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2733136.0, ans=0.0 2023-10-09 10:42:51,355 INFO [train.py:1031] (2/4) Epoch 14, batch 950, loss[loss=0.2692, simple_loss=0.3325, pruned_loss=0.07717, ctc_loss=0.1286, over 16888.00 frames. ], tot_loss[loss=0.2344, simple_loss=0.2965, pruned_loss=0.06361, ctc_loss=0.1127, over 3255273.76 frames. ], batch size: 228, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:42:53,038 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.30 vs. limit=15.0 2023-10-09 10:43:08,715 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2733229.3333333335, ans=0.1 2023-10-09 10:43:24,369 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2733276.0, ans=0.125 2023-10-09 10:43:29,614 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=2733322.6666666665, ans=0.025 2023-10-09 10:43:41,298 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2733369.3333333335, ans=0.125 2023-10-09 10:43:49,782 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2733369.3333333335, ans=0.1 2023-10-09 10:43:51,559 INFO [train.py:1031] (2/4) Epoch 14, batch 1000, loss[loss=0.2317, simple_loss=0.3, pruned_loss=0.05997, ctc_loss=0.1087, over 16847.00 frames. ], tot_loss[loss=0.2404, simple_loss=0.3025, pruned_loss=0.0659, ctc_loss=0.1163, over 3272459.47 frames. ], batch size: 215, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:44:01,998 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2733416.0, ans=0.2 2023-10-09 10:44:08,290 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2733462.6666666665, ans=0.0 2023-10-09 10:44:18,369 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.732e+02 3.383e+02 4.196e+02 5.191e+02 1.287e+03, threshold=8.392e+02, percent-clipped=5.0 2023-10-09 10:44:19,332 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.13 vs. limit=10.0 2023-10-09 10:44:23,235 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.76 vs. limit=15.0 2023-10-09 10:44:35,996 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2733556.0, ans=0.125 2023-10-09 10:44:45,052 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2733602.6666666665, ans=0.125 2023-10-09 10:44:47,787 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2733602.6666666665, ans=0.125 2023-10-09 10:44:52,742 INFO [train.py:1031] (2/4) Epoch 14, batch 1050, loss[loss=0.2675, simple_loss=0.2959, pruned_loss=0.08903, ctc_loss=0.1525, over 16486.00 frames. ], tot_loss[loss=0.2358, simple_loss=0.2961, pruned_loss=0.06494, ctc_loss=0.1142, over 3272703.48 frames. ], batch size: 350, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:45:10,406 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2733696.0, ans=0.1 2023-10-09 10:45:14,854 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2733696.0, ans=0.025 2023-10-09 10:45:23,967 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2733742.6666666665, ans=0.0 2023-10-09 10:45:25,980 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2733742.6666666665, ans=0.1 2023-10-09 10:45:52,760 INFO [train.py:1031] (2/4) Epoch 14, batch 1100, loss[loss=0.2065, simple_loss=0.2645, pruned_loss=0.05479, ctc_loss=0.09703, over 16822.00 frames. ], tot_loss[loss=0.2333, simple_loss=0.2911, pruned_loss=0.06502, ctc_loss=0.1139, over 3285470.82 frames. ], batch size: 102, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:46:09,646 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2733929.3333333335, ans=0.125 2023-10-09 10:46:09,847 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.59 vs. limit=10.0 2023-10-09 10:46:21,066 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.487e+02 3.252e+02 3.598e+02 4.155e+02 7.430e+02, threshold=7.195e+02, percent-clipped=0.0 2023-10-09 10:46:21,301 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2733976.0, ans=0.2 2023-10-09 10:46:22,495 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2733976.0, ans=0.125 2023-10-09 10:46:43,343 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2734069.3333333335, ans=0.125 2023-10-09 10:46:44,514 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.83 vs. limit=15.0 2023-10-09 10:46:52,570 INFO [train.py:1031] (2/4) Epoch 14, batch 1150, loss[loss=0.1778, simple_loss=0.2341, pruned_loss=0.04568, ctc_loss=0.07538, over 16624.00 frames. ], tot_loss[loss=0.2292, simple_loss=0.2852, pruned_loss=0.06415, ctc_loss=0.1125, over 3291878.92 frames. ], batch size: 151, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:47:04,401 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.88 vs. limit=15.0 2023-10-09 10:47:06,075 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2734162.6666666665, ans=0.0 2023-10-09 10:47:09,507 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.55 vs. limit=12.0 2023-10-09 10:47:22,506 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2734209.3333333335, ans=0.2 2023-10-09 10:47:29,374 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2734256.0, ans=0.125 2023-10-09 10:47:37,120 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.07 vs. limit=15.0 2023-10-09 10:47:37,386 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2023-10-09 10:47:48,347 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2734302.6666666665, ans=0.1 2023-10-09 10:47:50,440 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2734349.3333333335, ans=0.125 2023-10-09 10:47:51,156 INFO [train.py:1031] (2/4) Epoch 14, batch 1200, loss[loss=0.2243, simple_loss=0.2653, pruned_loss=0.06822, ctc_loss=0.1173, over 16598.00 frames. ], tot_loss[loss=0.225, simple_loss=0.2788, pruned_loss=0.06338, ctc_loss=0.111, over 3300003.65 frames. ], batch size: 110, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:47:52,969 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2734349.3333333335, ans=0.0 2023-10-09 10:47:54,713 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2734349.3333333335, ans=0.05 2023-10-09 10:48:06,286 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2734396.0, ans=0.0 2023-10-09 10:48:14,902 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2734442.6666666665, ans=0.0 2023-10-09 10:48:15,886 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2734442.6666666665, ans=0.2 2023-10-09 10:48:20,270 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.351e+02 2.978e+02 3.440e+02 3.913e+02 6.490e+02, threshold=6.880e+02, percent-clipped=0.0 2023-10-09 10:48:27,111 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.71 vs. limit=15.0 2023-10-09 10:48:28,890 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2734489.3333333335, ans=0.025 2023-10-09 10:48:51,818 INFO [train.py:1031] (2/4) Epoch 14, batch 1250, loss[loss=0.2506, simple_loss=0.2915, pruned_loss=0.07647, ctc_loss=0.142, over 16972.00 frames. ], tot_loss[loss=0.227, simple_loss=0.2786, pruned_loss=0.06496, ctc_loss=0.1136, over 3297919.13 frames. ], batch size: 228, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:48:59,703 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.90 vs. limit=22.5 2023-10-09 10:49:09,914 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 10:49:23,983 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2734676.0, ans=0.125 2023-10-09 10:49:25,185 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2734676.0, ans=0.125 2023-10-09 10:49:28,979 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2734722.6666666665, ans=0.04949747468305833 2023-10-09 10:49:35,612 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=9.85 vs. limit=22.5 2023-10-09 10:49:53,653 INFO [train.py:1031] (2/4) Epoch 14, batch 1300, loss[loss=0.2238, simple_loss=0.273, pruned_loss=0.06489, ctc_loss=0.112, over 16812.00 frames. ], tot_loss[loss=0.2277, simple_loss=0.2786, pruned_loss=0.06543, ctc_loss=0.1147, over 3309890.82 frames. ], batch size: 176, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:49:57,939 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2734816.0, ans=10.0 2023-10-09 10:50:00,935 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2734816.0, ans=0.04949747468305833 2023-10-09 10:50:25,186 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.632e+02 3.518e+02 3.904e+02 4.606e+02 8.060e+02, threshold=7.809e+02, percent-clipped=2.0 2023-10-09 10:50:37,264 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2734956.0, ans=0.125 2023-10-09 10:50:39,365 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2734956.0, ans=0.09899494936611666 2023-10-09 10:50:52,183 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2735002.6666666665, ans=0.125 2023-10-09 10:50:54,947 INFO [train.py:1031] (2/4) Epoch 14, batch 1350, loss[loss=0.2211, simple_loss=0.2776, pruned_loss=0.06196, ctc_loss=0.1017, over 16918.00 frames. ], tot_loss[loss=0.2279, simple_loss=0.2778, pruned_loss=0.06588, ctc_loss=0.1154, over 3318105.09 frames. ], batch size: 82, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:50:57,881 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2735049.3333333335, ans=0.0 2023-10-09 10:51:02,891 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2735049.3333333335, ans=0.0 2023-10-09 10:51:02,916 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2735049.3333333335, ans=0.2 2023-10-09 10:51:16,050 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2735096.0, ans=0.0 2023-10-09 10:51:16,599 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.37 vs. limit=15.0 2023-10-09 10:51:17,017 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2735096.0, ans=0.0 2023-10-09 10:51:20,848 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2735142.6666666665, ans=0.0 2023-10-09 10:51:28,282 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2735142.6666666665, ans=0.125 2023-10-09 10:51:36,529 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.34 vs. limit=6.0 2023-10-09 10:51:37,692 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.80 vs. limit=12.0 2023-10-09 10:51:38,227 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2735189.3333333335, ans=0.125 2023-10-09 10:51:39,243 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2735189.3333333335, ans=0.125 2023-10-09 10:51:55,993 INFO [train.py:1031] (2/4) Epoch 14, batch 1400, loss[loss=0.2218, simple_loss=0.2739, pruned_loss=0.06266, ctc_loss=0.1108, over 16901.00 frames. ], tot_loss[loss=0.2249, simple_loss=0.2734, pruned_loss=0.06537, ctc_loss=0.114, over 3314212.83 frames. ], batch size: 82, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:51:57,561 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2735282.6666666665, ans=0.2 2023-10-09 10:51:57,562 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2735282.6666666665, ans=0.125 2023-10-09 10:52:08,788 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2735329.3333333335, ans=0.0 2023-10-09 10:52:16,812 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2735329.3333333335, ans=0.125 2023-10-09 10:52:27,996 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.536e+02 3.259e+02 3.795e+02 4.545e+02 1.175e+03, threshold=7.590e+02, percent-clipped=1.0 2023-10-09 10:52:32,080 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.28 vs. limit=12.0 2023-10-09 10:52:41,840 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2735422.6666666665, ans=0.0 2023-10-09 10:52:55,892 INFO [train.py:1031] (2/4) Epoch 14, batch 1450, loss[loss=0.2042, simple_loss=0.3027, pruned_loss=0.03898, ctc_loss=0.06915, over 15196.00 frames. ], tot_loss[loss=0.2223, simple_loss=0.2747, pruned_loss=0.06293, ctc_loss=0.11, over 3314973.88 frames. ], batch size: 526, lr: 2.60e-03, grad_scale: 2.0 2023-10-09 10:52:58,869 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2735516.0, ans=0.125 2023-10-09 10:53:12,600 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.24 vs. limit=15.0 2023-10-09 10:53:18,476 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.92 vs. limit=15.0 2023-10-09 10:53:28,504 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2735609.3333333335, ans=0.125 2023-10-09 10:53:36,795 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2735656.0, ans=0.125 2023-10-09 10:53:40,893 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.81 vs. limit=15.0 2023-10-09 10:53:47,250 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2735702.6666666665, ans=0.125 2023-10-09 10:53:51,191 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.10 vs. limit=15.0 2023-10-09 10:53:57,018 INFO [train.py:1031] (2/4) Epoch 14, batch 1500, loss[loss=0.2453, simple_loss=0.2921, pruned_loss=0.07301, ctc_loss=0.1312, over 16792.00 frames. ], tot_loss[loss=0.2244, simple_loss=0.2775, pruned_loss=0.06341, ctc_loss=0.1112, over 3315940.38 frames. ], batch size: 271, lr: 2.60e-03, grad_scale: 4.0 2023-10-09 10:54:03,069 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2735749.3333333335, ans=0.125 2023-10-09 10:54:32,302 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.256e+02 3.308e+02 3.843e+02 4.778e+02 1.080e+03, threshold=7.686e+02, percent-clipped=1.0 2023-10-09 10:54:44,304 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=2735889.3333333335, ans=6.0 2023-10-09 10:54:46,803 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2735936.0, ans=0.1 2023-10-09 10:54:47,285 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.23 vs. limit=15.0 2023-10-09 10:55:00,101 INFO [train.py:1031] (2/4) Epoch 14, batch 1550, loss[loss=0.2211, simple_loss=0.2898, pruned_loss=0.05509, ctc_loss=0.1056, over 16910.00 frames. ], tot_loss[loss=0.2223, simple_loss=0.2768, pruned_loss=0.06207, ctc_loss=0.1092, over 3312620.53 frames. ], batch size: 292, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 10:55:02,164 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2735982.6666666665, ans=0.0 2023-10-09 10:55:13,702 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2736029.3333333335, ans=0.125 2023-10-09 10:55:31,912 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2736076.0, ans=0.0 2023-10-09 10:55:37,782 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.94 vs. limit=15.0 2023-10-09 10:55:46,379 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2736122.6666666665, ans=0.125 2023-10-09 10:55:49,879 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.31 vs. limit=15.0 2023-10-09 10:55:53,222 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-10-09 10:56:01,568 INFO [train.py:1031] (2/4) Epoch 14, batch 1600, loss[loss=0.2332, simple_loss=0.2953, pruned_loss=0.06233, ctc_loss=0.1163, over 16571.00 frames. ], tot_loss[loss=0.217, simple_loss=0.2754, pruned_loss=0.05859, ctc_loss=0.1037, over 3300996.07 frames. ], batch size: 351, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 10:56:04,812 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=9.80 vs. limit=22.5 2023-10-09 10:56:14,488 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2736262.6666666665, ans=0.125 2023-10-09 10:56:15,833 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.09 vs. limit=15.0 2023-10-09 10:56:16,923 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.80 vs. limit=22.5 2023-10-09 10:56:18,669 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2736262.6666666665, ans=0.0 2023-10-09 10:56:22,361 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2736262.6666666665, ans=0.1 2023-10-09 10:56:24,901 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.96 vs. limit=15.0 2023-10-09 10:56:36,432 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.081e+02 2.648e+02 3.123e+02 3.834e+02 1.151e+03, threshold=6.247e+02, percent-clipped=2.0 2023-10-09 10:56:58,179 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2736402.6666666665, ans=0.1 2023-10-09 10:57:01,604 INFO [train.py:1031] (2/4) Epoch 14, batch 1650, loss[loss=0.2399, simple_loss=0.3022, pruned_loss=0.06535, ctc_loss=0.1175, over 16723.00 frames. ], tot_loss[loss=0.22, simple_loss=0.2762, pruned_loss=0.0605, ctc_loss=0.1069, over 3289427.98 frames. ], batch size: 271, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 10:57:02,170 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.00 vs. limit=22.5 2023-10-09 10:57:03,083 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2736449.3333333335, ans=0.0 2023-10-09 10:57:03,124 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2736449.3333333335, ans=0.1 2023-10-09 10:57:15,598 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.32 vs. limit=15.0 2023-10-09 10:57:17,034 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2023-10-09 10:57:27,200 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2736542.6666666665, ans=0.0 2023-10-09 10:57:29,938 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 10:57:33,094 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2736542.6666666665, ans=0.09899494936611666 2023-10-09 10:57:33,109 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2736542.6666666665, ans=0.1 2023-10-09 10:57:40,583 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2736589.3333333335, ans=0.0 2023-10-09 10:58:03,241 INFO [train.py:1031] (2/4) Epoch 14, batch 1700, loss[loss=0.2324, simple_loss=0.2927, pruned_loss=0.06417, ctc_loss=0.1095, over 16858.00 frames. ], tot_loss[loss=0.2285, simple_loss=0.2831, pruned_loss=0.06432, ctc_loss=0.113, over 3299740.86 frames. ], batch size: 202, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 10:58:08,838 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2023-10-09 10:58:29,566 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2736776.0, ans=0.2 2023-10-09 10:58:30,591 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2736776.0, ans=0.0 2023-10-09 10:58:38,774 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.493e+02 3.292e+02 3.848e+02 4.651e+02 1.016e+03, threshold=7.697e+02, percent-clipped=4.0 2023-10-09 10:58:42,739 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2736822.6666666665, ans=0.125 2023-10-09 10:58:47,174 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2736822.6666666665, ans=0.0 2023-10-09 10:58:50,961 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2736822.6666666665, ans=0.125 2023-10-09 10:59:04,568 INFO [train.py:1031] (2/4) Epoch 14, batch 1750, loss[loss=0.2214, simple_loss=0.2774, pruned_loss=0.06117, ctc_loss=0.1076, over 16912.00 frames. ], tot_loss[loss=0.2313, simple_loss=0.2856, pruned_loss=0.06543, ctc_loss=0.1152, over 3299556.56 frames. ], batch size: 215, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 10:59:16,576 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.81 vs. limit=15.0 2023-10-09 10:59:33,611 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2737009.3333333335, ans=0.125 2023-10-09 10:59:43,470 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2737056.0, ans=0.125 2023-10-09 11:00:05,550 INFO [train.py:1031] (2/4) Epoch 14, batch 1800, loss[loss=0.242, simple_loss=0.3292, pruned_loss=0.05612, ctc_loss=0.1064, over 15202.00 frames. ], tot_loss[loss=0.2299, simple_loss=0.2852, pruned_loss=0.06449, ctc_loss=0.1138, over 3298501.19 frames. ], batch size: 527, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:00:15,166 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2737149.3333333335, ans=0.0 2023-10-09 11:00:16,165 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2737149.3333333335, ans=0.125 2023-10-09 11:00:43,557 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.315e+02 2.917e+02 3.383e+02 3.800e+02 1.043e+03, threshold=6.767e+02, percent-clipped=1.0 2023-10-09 11:00:43,896 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2737289.3333333335, ans=0.2 2023-10-09 11:01:06,589 INFO [train.py:1031] (2/4) Epoch 14, batch 1850, loss[loss=0.2045, simple_loss=0.2705, pruned_loss=0.05128, ctc_loss=0.08992, over 16713.00 frames. ], tot_loss[loss=0.2271, simple_loss=0.2858, pruned_loss=0.06219, ctc_loss=0.11, over 3302223.17 frames. ], batch size: 130, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:01:08,048 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 11:01:10,229 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2737382.6666666665, ans=0.125 2023-10-09 11:01:11,718 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2737382.6666666665, ans=0.125 2023-10-09 11:01:33,029 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.05 vs. limit=15.0 2023-10-09 11:01:36,638 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2737476.0, ans=0.125 2023-10-09 11:01:58,985 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.70 vs. limit=15.0 2023-10-09 11:02:00,929 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.68 vs. limit=22.5 2023-10-09 11:02:04,618 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2737569.3333333335, ans=0.0 2023-10-09 11:02:06,432 INFO [train.py:1031] (2/4) Epoch 14, batch 1900, loss[loss=0.2244, simple_loss=0.2972, pruned_loss=0.0553, ctc_loss=0.1024, over 16771.00 frames. ], tot_loss[loss=0.2274, simple_loss=0.2868, pruned_loss=0.06215, ctc_loss=0.1094, over 3308955.64 frames. ], batch size: 188, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:02:18,628 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2737662.6666666665, ans=0.0 2023-10-09 11:02:19,034 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.22 vs. limit=10.0 2023-10-09 11:02:43,659 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.235e+02 3.102e+02 3.624e+02 4.440e+02 7.780e+02, threshold=7.248e+02, percent-clipped=1.0 2023-10-09 11:02:51,940 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2737756.0, ans=0.1 2023-10-09 11:03:06,285 INFO [train.py:1031] (2/4) Epoch 14, batch 1950, loss[loss=0.2194, simple_loss=0.2723, pruned_loss=0.06233, ctc_loss=0.1045, over 16761.00 frames. ], tot_loss[loss=0.2282, simple_loss=0.2884, pruned_loss=0.06216, ctc_loss=0.1094, over 3299665.22 frames. ], batch size: 102, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:03:48,295 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2737989.3333333335, ans=0.0 2023-10-09 11:04:08,693 INFO [train.py:1031] (2/4) Epoch 14, batch 2000, loss[loss=0.2445, simple_loss=0.2926, pruned_loss=0.07469, ctc_loss=0.1176, over 16728.00 frames. ], tot_loss[loss=0.2312, simple_loss=0.2903, pruned_loss=0.06369, ctc_loss=0.112, over 3307038.37 frames. ], batch size: 151, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:04:13,523 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.92 vs. limit=15.0 2023-10-09 11:04:18,183 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2738082.6666666665, ans=0.1 2023-10-09 11:04:29,075 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2738129.3333333335, ans=0.0 2023-10-09 11:04:48,440 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.567e+02 3.379e+02 3.817e+02 4.663e+02 9.562e+02, threshold=7.635e+02, percent-clipped=5.0 2023-10-09 11:05:01,166 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2738269.3333333335, ans=0.025 2023-10-09 11:05:07,883 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.92 vs. limit=12.0 2023-10-09 11:05:09,397 INFO [train.py:1031] (2/4) Epoch 14, batch 2050, loss[loss=0.2209, simple_loss=0.2881, pruned_loss=0.05766, ctc_loss=0.096, over 16794.00 frames. ], tot_loss[loss=0.2375, simple_loss=0.2952, pruned_loss=0.06649, ctc_loss=0.1172, over 3304796.80 frames. ], batch size: 102, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:05:09,742 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2738316.0, ans=0.2 2023-10-09 11:05:14,440 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2738316.0, ans=0.125 2023-10-09 11:05:15,457 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2738316.0, ans=0.125 2023-10-09 11:05:27,917 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2738362.6666666665, ans=0.2 2023-10-09 11:06:05,114 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.75 vs. limit=22.5 2023-10-09 11:06:10,927 INFO [train.py:1031] (2/4) Epoch 14, batch 2100, loss[loss=0.2435, simple_loss=0.2875, pruned_loss=0.07458, ctc_loss=0.126, over 16743.00 frames. ], tot_loss[loss=0.2363, simple_loss=0.2928, pruned_loss=0.06646, ctc_loss=0.1171, over 3309962.64 frames. ], batch size: 151, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:06:53,246 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.543e+02 3.150e+02 3.651e+02 4.552e+02 6.884e+02, threshold=7.301e+02, percent-clipped=0.0 2023-10-09 11:07:08,855 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.47 vs. limit=15.0 2023-10-09 11:07:11,462 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2738736.0, ans=0.0 2023-10-09 11:07:11,556 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2738736.0, ans=0.0 2023-10-09 11:07:12,618 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2738782.6666666665, ans=0.0 2023-10-09 11:07:13,886 INFO [train.py:1031] (2/4) Epoch 14, batch 2150, loss[loss=0.2039, simple_loss=0.2471, pruned_loss=0.05961, ctc_loss=0.1038, over 16509.00 frames. ], tot_loss[loss=0.2339, simple_loss=0.2926, pruned_loss=0.06464, ctc_loss=0.1145, over 3307625.42 frames. ], batch size: 110, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:07:34,939 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.16 vs. limit=22.5 2023-10-09 11:07:42,123 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.67 vs. limit=22.5 2023-10-09 11:08:06,269 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.whiten.whitening_limit, batch_count=2738969.3333333335, ans=12.0 2023-10-09 11:08:14,607 INFO [train.py:1031] (2/4) Epoch 14, batch 2200, loss[loss=0.2049, simple_loss=0.2694, pruned_loss=0.05249, ctc_loss=0.08852, over 16935.00 frames. ], tot_loss[loss=0.2322, simple_loss=0.2904, pruned_loss=0.06422, ctc_loss=0.1137, over 3307863.40 frames. ], batch size: 228, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:08:15,311 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.43 vs. limit=22.5 2023-10-09 11:08:27,036 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2739062.6666666665, ans=0.125 2023-10-09 11:08:33,484 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2739062.6666666665, ans=0.2 2023-10-09 11:08:44,745 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2739109.3333333335, ans=0.125 2023-10-09 11:08:55,207 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2739156.0, ans=0.0 2023-10-09 11:08:58,372 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.339e+02 3.214e+02 3.681e+02 4.721e+02 1.015e+03, threshold=7.363e+02, percent-clipped=4.0 2023-10-09 11:09:00,849 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2739156.0, ans=0.125 2023-10-09 11:09:16,599 INFO [train.py:1031] (2/4) Epoch 14, batch 2250, loss[loss=0.188, simple_loss=0.2411, pruned_loss=0.04965, ctc_loss=0.08934, over 16829.00 frames. ], tot_loss[loss=0.2287, simple_loss=0.2853, pruned_loss=0.06356, ctc_loss=0.1124, over 3307854.63 frames. ], batch size: 164, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:09:39,950 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2739296.0, ans=0.125 2023-10-09 11:10:18,410 INFO [train.py:1031] (2/4) Epoch 14, batch 2300, loss[loss=0.2479, simple_loss=0.3011, pruned_loss=0.07249, ctc_loss=0.1243, over 16735.00 frames. ], tot_loss[loss=0.2245, simple_loss=0.2788, pruned_loss=0.06293, ctc_loss=0.1111, over 3301450.89 frames. ], batch size: 309, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:10:24,032 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 11:10:30,968 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2739529.3333333335, ans=0.125 2023-10-09 11:10:43,475 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.89 vs. limit=15.0 2023-10-09 11:10:48,997 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.64 vs. limit=22.5 2023-10-09 11:11:03,965 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2739622.6666666665, ans=0.0 2023-10-09 11:11:04,760 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.458e+02 3.279e+02 3.727e+02 4.728e+02 7.971e+02, threshold=7.454e+02, percent-clipped=1.0 2023-10-09 11:11:08,920 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2739669.3333333335, ans=0.125 2023-10-09 11:11:15,462 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2739669.3333333335, ans=0.1 2023-10-09 11:11:21,217 INFO [train.py:1031] (2/4) Epoch 14, batch 2350, loss[loss=0.3367, simple_loss=0.3565, pruned_loss=0.117, ctc_loss=0.2074, over 16680.00 frames. ], tot_loss[loss=0.2286, simple_loss=0.2818, pruned_loss=0.06484, ctc_loss=0.1144, over 3303131.02 frames. ], batch size: 384, lr: 2.59e-03, grad_scale: 1.0 2023-10-09 11:11:30,067 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2739716.0, ans=0.2 2023-10-09 11:11:32,095 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.07 vs. limit=15.0 2023-10-09 11:11:39,031 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=2739762.6666666665, ans=15.0 2023-10-09 11:11:43,185 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2739762.6666666665, ans=0.125 2023-10-09 11:11:43,341 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.03 vs. limit=15.0 2023-10-09 11:12:18,321 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2739902.6666666665, ans=0.0 2023-10-09 11:12:22,657 INFO [train.py:1031] (2/4) Epoch 14, batch 2400, loss[loss=0.2272, simple_loss=0.2786, pruned_loss=0.06509, ctc_loss=0.1143, over 16838.00 frames. ], tot_loss[loss=0.2325, simple_loss=0.2845, pruned_loss=0.06679, ctc_loss=0.1172, over 3296542.83 frames. ], batch size: 328, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:12:31,188 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2739949.3333333335, ans=0.125 2023-10-09 11:12:42,725 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2739996.0, ans=0.125 2023-10-09 11:13:02,295 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2740089.3333333335, ans=0.0 2023-10-09 11:13:03,344 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2740089.3333333335, ans=0.2 2023-10-09 11:13:09,494 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.813e+02 3.337e+02 3.917e+02 4.663e+02 1.051e+03, threshold=7.833e+02, percent-clipped=2.0 2023-10-09 11:13:15,717 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2740136.0, ans=0.125 2023-10-09 11:13:25,551 INFO [train.py:1031] (2/4) Epoch 14, batch 2450, loss[loss=0.1838, simple_loss=0.2638, pruned_loss=0.0379, ctc_loss=0.06994, over 16894.00 frames. ], tot_loss[loss=0.2314, simple_loss=0.2829, pruned_loss=0.06659, ctc_loss=0.1167, over 3304389.63 frames. ], batch size: 228, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:14:22,178 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2740369.3333333335, ans=0.1 2023-10-09 11:14:28,352 INFO [train.py:1031] (2/4) Epoch 14, batch 2500, loss[loss=0.1824, simple_loss=0.2631, pruned_loss=0.0369, ctc_loss=0.06981, over 16782.00 frames. ], tot_loss[loss=0.2234, simple_loss=0.279, pruned_loss=0.06209, ctc_loss=0.1092, over 3308114.57 frames. ], batch size: 164, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:14:30,857 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2740416.0, ans=0.125 2023-10-09 11:14:32,324 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=11.99 vs. limit=22.5 2023-10-09 11:14:35,568 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 11:14:42,047 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2740462.6666666665, ans=0.125 2023-10-09 11:14:54,288 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2740509.3333333335, ans=0.125 2023-10-09 11:15:17,464 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.053e+02 2.820e+02 3.218e+02 3.802e+02 1.081e+03, threshold=6.436e+02, percent-clipped=2.0 2023-10-09 11:15:32,460 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2740649.3333333335, ans=0.125 2023-10-09 11:15:32,751 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.92 vs. limit=6.0 2023-10-09 11:15:33,264 INFO [train.py:1031] (2/4) Epoch 14, batch 2550, loss[loss=0.264, simple_loss=0.3265, pruned_loss=0.07573, ctc_loss=0.125, over 16814.00 frames. ], tot_loss[loss=0.2231, simple_loss=0.2807, pruned_loss=0.06135, ctc_loss=0.107, over 3298050.74 frames. ], batch size: 328, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:16:35,624 INFO [train.py:1031] (2/4) Epoch 14, batch 2600, loss[loss=0.2248, simple_loss=0.2997, pruned_loss=0.05449, ctc_loss=0.1023, over 15182.00 frames. ], tot_loss[loss=0.2231, simple_loss=0.2803, pruned_loss=0.0615, ctc_loss=0.107, over 3293026.62 frames. ], batch size: 527, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:17:05,731 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2740976.0, ans=0.125 2023-10-09 11:17:23,702 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.364e+02 2.979e+02 3.641e+02 4.453e+02 7.344e+02, threshold=7.282e+02, percent-clipped=4.0 2023-10-09 11:17:24,157 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2741069.3333333335, ans=0.125 2023-10-09 11:17:29,763 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2741069.3333333335, ans=0.125 2023-10-09 11:17:37,767 INFO [train.py:1031] (2/4) Epoch 14, batch 2650, loss[loss=0.2379, simple_loss=0.3132, pruned_loss=0.0595, ctc_loss=0.109, over 16240.00 frames. ], tot_loss[loss=0.2237, simple_loss=0.2842, pruned_loss=0.06036, ctc_loss=0.1061, over 3289325.75 frames. ], batch size: 463, lr: 2.59e-03, grad_scale: 1.0 2023-10-09 11:17:45,736 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2741116.0, ans=0.125 2023-10-09 11:17:47,558 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2741116.0, ans=0.0 2023-10-09 11:17:52,238 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2741162.6666666665, ans=0.125 2023-10-09 11:18:06,197 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=5.58 vs. limit=12.0 2023-10-09 11:18:06,895 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2741209.3333333335, ans=0.1 2023-10-09 11:18:22,698 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2741256.0, ans=0.125 2023-10-09 11:18:28,680 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2741302.6666666665, ans=0.125 2023-10-09 11:18:31,865 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2741302.6666666665, ans=0.125 2023-10-09 11:18:39,291 INFO [train.py:1031] (2/4) Epoch 14, batch 2700, loss[loss=0.2048, simple_loss=0.2543, pruned_loss=0.05888, ctc_loss=0.09369, over 16928.00 frames. ], tot_loss[loss=0.2314, simple_loss=0.29, pruned_loss=0.06395, ctc_loss=0.1122, over 3292904.62 frames. ], batch size: 78, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:18:45,276 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2741349.3333333335, ans=0.125 2023-10-09 11:18:54,151 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2741396.0, ans=0.05 2023-10-09 11:18:59,093 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2741396.0, ans=0.2 2023-10-09 11:19:17,668 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2741489.3333333335, ans=0.125 2023-10-09 11:19:21,285 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.31 vs. limit=15.0 2023-10-09 11:19:31,011 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.501e+02 3.579e+02 4.156e+02 4.960e+02 1.400e+03, threshold=8.312e+02, percent-clipped=4.0 2023-10-09 11:19:34,881 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.08 vs. limit=22.5 2023-10-09 11:19:42,016 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.19 vs. limit=15.0 2023-10-09 11:19:42,376 INFO [train.py:1031] (2/4) Epoch 14, batch 2750, loss[loss=0.2153, simple_loss=0.2591, pruned_loss=0.06427, ctc_loss=0.1072, over 16751.00 frames. ], tot_loss[loss=0.2319, simple_loss=0.2932, pruned_loss=0.06306, ctc_loss=0.1111, over 3275558.61 frames. ], batch size: 121, lr: 2.59e-03, grad_scale: 1.0 2023-10-09 11:19:43,157 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.93 vs. limit=15.0 2023-10-09 11:19:47,914 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.30 vs. limit=15.0 2023-10-09 11:20:00,380 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2741629.3333333335, ans=0.2 2023-10-09 11:20:06,791 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 11:20:09,269 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.50 vs. limit=6.0 2023-10-09 11:20:24,149 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 11:20:39,395 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2741769.3333333335, ans=0.2 2023-10-09 11:20:44,739 INFO [train.py:1031] (2/4) Epoch 14, batch 2800, loss[loss=0.1694, simple_loss=0.2222, pruned_loss=0.04374, ctc_loss=0.07305, over 16906.00 frames. ], tot_loss[loss=0.2238, simple_loss=0.2883, pruned_loss=0.05877, ctc_loss=0.1042, over 3278496.62 frames. ], batch size: 90, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:21:16,243 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2741909.3333333335, ans=0.0 2023-10-09 11:21:24,973 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2741956.0, ans=0.125 2023-10-09 11:21:31,035 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2741956.0, ans=0.125 2023-10-09 11:21:33,179 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=2742002.6666666665, ans=0.02 2023-10-09 11:21:35,708 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.219e+02 3.033e+02 3.735e+02 4.727e+02 1.179e+03, threshold=7.471e+02, percent-clipped=1.0 2023-10-09 11:21:47,212 INFO [train.py:1031] (2/4) Epoch 14, batch 2850, loss[loss=0.1969, simple_loss=0.2733, pruned_loss=0.04408, ctc_loss=0.08097, over 16829.00 frames. ], tot_loss[loss=0.2185, simple_loss=0.2837, pruned_loss=0.05659, ctc_loss=0.1004, over 3287022.74 frames. ], batch size: 228, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:22:10,217 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2742096.0, ans=0.125 2023-10-09 11:22:44,975 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.80 vs. limit=6.0 2023-10-09 11:22:51,992 INFO [train.py:1031] (2/4) Epoch 14, batch 2900, loss[loss=0.1905, simple_loss=0.2296, pruned_loss=0.05745, ctc_loss=0.09139, over 16524.00 frames. ], tot_loss[loss=0.2164, simple_loss=0.2842, pruned_loss=0.05478, ctc_loss=0.09766, over 3289913.41 frames. ], batch size: 70, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:23:25,233 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2742376.0, ans=0.0 2023-10-09 11:23:27,026 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2742376.0, ans=0.125 2023-10-09 11:23:29,014 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2742422.6666666665, ans=0.125 2023-10-09 11:23:40,758 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2742469.3333333335, ans=0.0 2023-10-09 11:23:43,139 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.228e+02 3.117e+02 3.737e+02 4.874e+02 8.025e+02, threshold=7.473e+02, percent-clipped=2.0 2023-10-09 11:23:52,692 INFO [train.py:1031] (2/4) Epoch 14, batch 2950, loss[loss=0.2239, simple_loss=0.2864, pruned_loss=0.05914, ctc_loss=0.108, over 16965.00 frames. ], tot_loss[loss=0.2178, simple_loss=0.2852, pruned_loss=0.05543, ctc_loss=0.09887, over 3289458.46 frames. ], batch size: 243, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:23:54,581 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2742516.0, ans=0.0 2023-10-09 11:24:24,851 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2742609.3333333335, ans=0.05 2023-10-09 11:24:32,728 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2742656.0, ans=0.125 2023-10-09 11:24:40,704 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2742656.0, ans=0.0 2023-10-09 11:24:52,158 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2742702.6666666665, ans=0.125 2023-10-09 11:24:55,799 INFO [train.py:1031] (2/4) Epoch 14, batch 3000, loss[loss=0.2278, simple_loss=0.2991, pruned_loss=0.05588, ctc_loss=0.1117, over 16886.00 frames. ], tot_loss[loss=0.2204, simple_loss=0.2843, pruned_loss=0.05777, ctc_loss=0.1025, over 3281395.51 frames. ], batch size: 309, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:24:55,800 INFO [train.py:1054] (2/4) Computing validation loss 2023-10-09 11:25:13,598 INFO [train.py:1063] (2/4) Epoch 14, validation: loss=0.2392, simple_loss=0.3062, pruned_loss=0.06637, ctc_loss=0.09863, over 1796401.00 frames. 2023-10-09 11:25:13,599 INFO [train.py:1064] (2/4) Maximum memory allocated so far is 14591MB 2023-10-09 11:25:16,747 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2742749.3333333335, ans=0.125 2023-10-09 11:25:16,856 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=2742749.3333333335, ans=15.0 2023-10-09 11:25:18,983 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.88 vs. limit=15.0 2023-10-09 11:25:21,071 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.52 vs. limit=15.0 2023-10-09 11:25:30,171 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2742796.0, ans=0.0 2023-10-09 11:25:53,435 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2742889.3333333335, ans=0.125 2023-10-09 11:26:03,522 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.96 vs. limit=10.0 2023-10-09 11:26:05,135 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.377e+02 3.037e+02 3.527e+02 4.152e+02 6.631e+02, threshold=7.054e+02, percent-clipped=0.0 2023-10-09 11:26:05,801 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.51 vs. limit=22.5 2023-10-09 11:26:08,731 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2742936.0, ans=0.0 2023-10-09 11:26:10,368 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2742936.0, ans=0.125 2023-10-09 11:26:14,948 INFO [train.py:1031] (2/4) Epoch 14, batch 3050, loss[loss=0.2134, simple_loss=0.2412, pruned_loss=0.06704, ctc_loss=0.1288, over 15326.00 frames. ], tot_loss[loss=0.2191, simple_loss=0.2805, pruned_loss=0.05813, ctc_loss=0.1034, over 3292810.32 frames. ], batch size: 526, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:26:18,005 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2742982.6666666665, ans=0.0 2023-10-09 11:26:25,708 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2743029.3333333335, ans=0.125 2023-10-09 11:26:30,743 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2743029.3333333335, ans=0.125 2023-10-09 11:27:15,158 INFO [train.py:1031] (2/4) Epoch 14, batch 3100, loss[loss=0.1812, simple_loss=0.2312, pruned_loss=0.04986, ctc_loss=0.07886, over 16710.00 frames. ], tot_loss[loss=0.217, simple_loss=0.2745, pruned_loss=0.05888, ctc_loss=0.1042, over 3294407.44 frames. ], batch size: 95, lr: 2.59e-03, grad_scale: 8.0 2023-10-09 11:27:17,480 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2743216.0, ans=0.1 2023-10-09 11:27:46,566 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2743309.3333333335, ans=0.0 2023-10-09 11:27:49,782 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2743356.0, ans=0.0 2023-10-09 11:28:01,216 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2743356.0, ans=0.125 2023-10-09 11:28:07,757 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2023-10-09 11:28:07,999 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.086e+02 2.923e+02 3.339e+02 4.092e+02 6.355e+02, threshold=6.678e+02, percent-clipped=0.0 2023-10-09 11:28:13,307 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2743402.6666666665, ans=0.125 2023-10-09 11:28:15,789 INFO [train.py:1031] (2/4) Epoch 14, batch 3150, loss[loss=0.2648, simple_loss=0.3261, pruned_loss=0.07519, ctc_loss=0.1331, over 16401.00 frames. ], tot_loss[loss=0.2127, simple_loss=0.2703, pruned_loss=0.05724, ctc_loss=0.1013, over 3298325.30 frames. ], batch size: 416, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:28:23,297 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2743449.3333333335, ans=0.0 2023-10-09 11:29:08,701 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.76 vs. limit=22.5 2023-10-09 11:29:10,591 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.14 vs. limit=15.0 2023-10-09 11:29:17,392 INFO [train.py:1031] (2/4) Epoch 14, batch 3200, loss[loss=0.2831, simple_loss=0.3265, pruned_loss=0.08678, ctc_loss=0.1652, over 16653.00 frames. ], tot_loss[loss=0.2199, simple_loss=0.2799, pruned_loss=0.05892, ctc_loss=0.1053, over 3303295.53 frames. ], batch size: 351, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:29:19,812 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2743682.6666666665, ans=0.125 2023-10-09 11:29:22,451 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2743682.6666666665, ans=0.1 2023-10-09 11:29:53,496 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2743822.6666666665, ans=0.0 2023-10-09 11:30:12,209 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.450e+02 3.362e+02 3.915e+02 4.671e+02 1.064e+03, threshold=7.829e+02, percent-clipped=5.0 2023-10-09 11:30:18,612 INFO [train.py:1031] (2/4) Epoch 14, batch 3250, loss[loss=0.2147, simple_loss=0.271, pruned_loss=0.05833, ctc_loss=0.1042, over 16900.00 frames. ], tot_loss[loss=0.2222, simple_loss=0.281, pruned_loss=0.06027, ctc_loss=0.1072, over 3297872.84 frames. ], batch size: 189, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:30:22,340 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2743916.0, ans=0.125 2023-10-09 11:30:35,336 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2743962.6666666665, ans=0.2 2023-10-09 11:31:17,112 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2744102.6666666665, ans=0.1 2023-10-09 11:31:23,914 INFO [train.py:1031] (2/4) Epoch 14, batch 3300, loss[loss=0.2804, simple_loss=0.3307, pruned_loss=0.08316, ctc_loss=0.1596, over 16887.00 frames. ], tot_loss[loss=0.23, simple_loss=0.2884, pruned_loss=0.06333, ctc_loss=0.1127, over 3301974.58 frames. ], batch size: 292, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:31:25,900 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2744149.3333333335, ans=0.125 2023-10-09 11:31:27,117 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.81 vs. limit=22.5 2023-10-09 11:31:28,266 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.10 vs. limit=10.0 2023-10-09 11:31:31,430 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2744149.3333333335, ans=0.2 2023-10-09 11:31:34,341 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2023-10-09 11:31:49,557 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2744242.6666666665, ans=0.0 2023-10-09 11:31:51,709 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2744242.6666666665, ans=0.125 2023-10-09 11:32:20,913 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.523e+02 3.186e+02 3.803e+02 4.439e+02 1.060e+03, threshold=7.606e+02, percent-clipped=1.0 2023-10-09 11:32:26,284 INFO [train.py:1031] (2/4) Epoch 14, batch 3350, loss[loss=0.2463, simple_loss=0.2884, pruned_loss=0.07624, ctc_loss=0.1295, over 16722.00 frames. ], tot_loss[loss=0.2299, simple_loss=0.2862, pruned_loss=0.06414, ctc_loss=0.1134, over 3300956.87 frames. ], batch size: 102, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:32:40,552 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2744429.3333333335, ans=0.2 2023-10-09 11:32:45,096 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.38 vs. limit=15.0 2023-10-09 11:33:13,242 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2744522.6666666665, ans=0.0 2023-10-09 11:33:15,418 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2744569.3333333335, ans=0.95 2023-10-09 11:33:29,578 INFO [train.py:1031] (2/4) Epoch 14, batch 3400, loss[loss=0.2224, simple_loss=0.2781, pruned_loss=0.06215, ctc_loss=0.1062, over 16536.00 frames. ], tot_loss[loss=0.2294, simple_loss=0.2847, pruned_loss=0.06431, ctc_loss=0.1136, over 3288991.94 frames. ], batch size: 110, lr: 2.59e-03, grad_scale: 8.0 2023-10-09 11:33:51,193 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.94 vs. limit=15.0 2023-10-09 11:33:51,315 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.45 vs. limit=22.5 2023-10-09 11:34:00,581 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2744709.3333333335, ans=0.125 2023-10-09 11:34:20,465 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2744802.6666666665, ans=0.1 2023-10-09 11:34:23,074 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2744802.6666666665, ans=0.0 2023-10-09 11:34:26,690 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.487e+02 3.089e+02 3.600e+02 4.217e+02 8.048e+02, threshold=7.200e+02, percent-clipped=1.0 2023-10-09 11:34:28,267 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2744802.6666666665, ans=0.125 2023-10-09 11:34:30,974 INFO [train.py:1031] (2/4) Epoch 14, batch 3450, loss[loss=0.2326, simple_loss=0.3201, pruned_loss=0.05271, ctc_loss=0.09906, over 15185.00 frames. ], tot_loss[loss=0.2317, simple_loss=0.2895, pruned_loss=0.06415, ctc_loss=0.1138, over 3295157.96 frames. ], batch size: 526, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:34:43,457 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2744896.0, ans=0.1 2023-10-09 11:35:07,216 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2744989.3333333335, ans=0.125 2023-10-09 11:35:12,670 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2744989.3333333335, ans=0.2 2023-10-09 11:35:32,419 INFO [train.py:1031] (2/4) Epoch 14, batch 3500, loss[loss=0.2175, simple_loss=0.2998, pruned_loss=0.04892, ctc_loss=0.09325, over 15206.00 frames. ], tot_loss[loss=0.227, simple_loss=0.2855, pruned_loss=0.06221, ctc_loss=0.1102, over 3293471.90 frames. ], batch size: 526, lr: 2.59e-03, grad_scale: 8.0 2023-10-09 11:35:32,977 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.46 vs. limit=15.0 2023-10-09 11:35:40,197 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2745082.6666666665, ans=0.0 2023-10-09 11:35:52,544 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 11:36:03,053 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2745176.0, ans=0.125 2023-10-09 11:36:12,960 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=2745222.6666666665, ans=22.5 2023-10-09 11:36:28,665 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.339e+02 2.946e+02 3.396e+02 4.316e+02 6.919e+02, threshold=6.792e+02, percent-clipped=0.0 2023-10-09 11:36:31,828 INFO [train.py:1031] (2/4) Epoch 14, batch 3550, loss[loss=0.2079, simple_loss=0.2497, pruned_loss=0.06245, ctc_loss=0.1032, over 16598.00 frames. ], tot_loss[loss=0.2218, simple_loss=0.2784, pruned_loss=0.06108, ctc_loss=0.1077, over 3298369.22 frames. ], batch size: 110, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:36:33,895 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2745316.0, ans=0.0 2023-10-09 11:36:35,322 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.24 vs. limit=12.0 2023-10-09 11:36:55,127 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2745409.3333333335, ans=0.0 2023-10-09 11:37:19,263 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2745502.6666666665, ans=0.125 2023-10-09 11:37:32,210 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2745549.3333333335, ans=0.125 2023-10-09 11:37:32,881 INFO [train.py:1031] (2/4) Epoch 14, batch 3600, loss[loss=0.1845, simple_loss=0.2182, pruned_loss=0.05539, ctc_loss=0.09992, over 16140.00 frames. ], tot_loss[loss=0.2187, simple_loss=0.2722, pruned_loss=0.06111, ctc_loss=0.1074, over 3294802.97 frames. ], batch size: 466, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:38:10,673 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2745689.3333333335, ans=0.125 2023-10-09 11:38:12,806 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2745689.3333333335, ans=0.125 2023-10-09 11:38:21,566 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2745736.0, ans=0.125 2023-10-09 11:38:31,934 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.345e+02 3.169e+02 3.614e+02 4.285e+02 9.204e+02, threshold=7.228e+02, percent-clipped=2.0 2023-10-09 11:38:33,631 INFO [train.py:1031] (2/4) Epoch 14, batch 3650, loss[loss=0.2109, simple_loss=0.2647, pruned_loss=0.05968, ctc_loss=0.09413, over 16990.00 frames. ], tot_loss[loss=0.2167, simple_loss=0.2685, pruned_loss=0.06112, ctc_loss=0.107, over 3296022.85 frames. ], batch size: 86, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:38:41,872 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2745782.6666666665, ans=0.125 2023-10-09 11:38:51,303 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2745829.3333333335, ans=0.0 2023-10-09 11:39:36,568 INFO [train.py:1031] (2/4) Epoch 14, batch 3700, loss[loss=0.279, simple_loss=0.3254, pruned_loss=0.08515, ctc_loss=0.1556, over 16786.00 frames. ], tot_loss[loss=0.2236, simple_loss=0.2741, pruned_loss=0.06421, ctc_loss=0.1119, over 3302543.30 frames. ], batch size: 329, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:39:39,105 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2746016.0, ans=0.2 2023-10-09 11:39:44,972 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2746016.0, ans=0.125 2023-10-09 11:39:48,285 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.74 vs. limit=15.0 2023-10-09 11:40:02,505 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2746109.3333333335, ans=0.1 2023-10-09 11:40:40,093 INFO [train.py:1031] (2/4) Epoch 14, batch 3750, loss[loss=0.2386, simple_loss=0.2936, pruned_loss=0.06747, ctc_loss=0.122, over 16276.00 frames. ], tot_loss[loss=0.2322, simple_loss=0.2818, pruned_loss=0.06778, ctc_loss=0.1175, over 3301459.65 frames. ], batch size: 463, lr: 2.59e-03, grad_scale: 1.0 2023-10-09 11:40:41,108 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.545e+02 3.323e+02 3.693e+02 4.050e+02 7.078e+02, threshold=7.386e+02, percent-clipped=0.0 2023-10-09 11:40:43,085 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2746249.3333333335, ans=0.125 2023-10-09 11:40:47,433 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2746249.3333333335, ans=0.125 2023-10-09 11:40:54,859 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2746296.0, ans=0.125 2023-10-09 11:41:12,060 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2746342.6666666665, ans=0.125 2023-10-09 11:41:29,290 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2746436.0, ans=0.1 2023-10-09 11:41:36,073 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2746436.0, ans=0.0 2023-10-09 11:41:43,229 INFO [train.py:1031] (2/4) Epoch 14, batch 3800, loss[loss=0.219, simple_loss=0.2661, pruned_loss=0.06494, ctc_loss=0.1053, over 16598.00 frames. ], tot_loss[loss=0.238, simple_loss=0.2879, pruned_loss=0.06985, ctc_loss=0.121, over 3300096.44 frames. ], batch size: 110, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:41:57,168 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.43 vs. limit=10.0 2023-10-09 11:42:15,464 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 11:42:41,228 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2746669.3333333335, ans=0.125 2023-10-09 11:42:44,576 INFO [train.py:1031] (2/4) Epoch 14, batch 3850, loss[loss=0.1837, simple_loss=0.2303, pruned_loss=0.05076, ctc_loss=0.08901, over 16824.00 frames. ], tot_loss[loss=0.2319, simple_loss=0.2813, pruned_loss=0.06771, ctc_loss=0.1176, over 3305252.29 frames. ], batch size: 176, lr: 2.59e-03, grad_scale: 1.0 2023-10-09 11:42:45,880 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2746716.0, ans=0.2 2023-10-09 11:42:47,258 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.380e+02 3.112e+02 3.545e+02 3.960e+02 7.617e+02, threshold=7.089e+02, percent-clipped=1.0 2023-10-09 11:43:35,372 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2746902.6666666665, ans=0.2 2023-10-09 11:43:46,337 INFO [train.py:1031] (2/4) Epoch 14, batch 3900, loss[loss=0.2309, simple_loss=0.3034, pruned_loss=0.05762, ctc_loss=0.108, over 16800.00 frames. ], tot_loss[loss=0.2275, simple_loss=0.2797, pruned_loss=0.06495, ctc_loss=0.1134, over 3297511.50 frames. ], batch size: 242, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:43:57,814 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2746996.0, ans=0.0 2023-10-09 11:44:06,129 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2746996.0, ans=0.0 2023-10-09 11:44:47,987 INFO [train.py:1031] (2/4) Epoch 14, batch 3950, loss[loss=0.1881, simple_loss=0.2467, pruned_loss=0.04762, ctc_loss=0.08544, over 16832.00 frames. ], tot_loss[loss=0.2247, simple_loss=0.2769, pruned_loss=0.06388, ctc_loss=0.1118, over 3295946.44 frames. ], batch size: 243, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:44:50,713 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.320e+02 3.084e+02 3.430e+02 4.061e+02 1.180e+03, threshold=6.860e+02, percent-clipped=1.0 2023-10-09 11:45:06,755 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=2747229.3333333335, ans=15.0 2023-10-09 11:45:10,890 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.13 vs. limit=15.0 2023-10-09 11:45:16,429 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2747276.0, ans=0.5 2023-10-09 11:45:23,669 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2747276.0, ans=0.125 2023-10-09 11:45:37,981 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=9.84 vs. limit=22.5 2023-10-09 11:45:39,443 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2747369.3333333335, ans=0.125 2023-10-09 11:45:50,020 INFO [train.py:1031] (2/4) Epoch 14, batch 4000, loss[loss=0.2846, simple_loss=0.3199, pruned_loss=0.09115, ctc_loss=0.1674, over 16792.00 frames. ], tot_loss[loss=0.2272, simple_loss=0.2778, pruned_loss=0.06545, ctc_loss=0.1144, over 3292792.58 frames. ], batch size: 329, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:45:51,506 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2747416.0, ans=0.125 2023-10-09 11:45:54,780 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2747416.0, ans=0.2 2023-10-09 11:45:56,798 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2747416.0, ans=0.125 2023-10-09 11:46:00,643 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2747416.0, ans=0.0 2023-10-09 11:46:15,721 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2747509.3333333335, ans=0.125 2023-10-09 11:46:22,651 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2747509.3333333335, ans=0.0 2023-10-09 11:46:26,358 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.64 vs. limit=15.0 2023-10-09 11:46:34,988 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.17 vs. limit=22.5 2023-10-09 11:46:44,030 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2747602.6666666665, ans=0.035 2023-10-09 11:46:51,637 INFO [train.py:1031] (2/4) Epoch 14, batch 4050, loss[loss=0.2093, simple_loss=0.2634, pruned_loss=0.0575, ctc_loss=0.1003, over 16920.00 frames. ], tot_loss[loss=0.2321, simple_loss=0.2818, pruned_loss=0.06755, ctc_loss=0.118, over 3289153.60 frames. ], batch size: 215, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:46:54,423 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.417e+02 3.332e+02 3.986e+02 4.544e+02 6.934e+02, threshold=7.972e+02, percent-clipped=1.0 2023-10-09 11:47:03,741 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2747696.0, ans=0.0 2023-10-09 11:47:35,808 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2747789.3333333335, ans=0.2 2023-10-09 11:47:37,409 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2747789.3333333335, ans=0.2 2023-10-09 11:47:52,885 INFO [train.py:1031] (2/4) Epoch 14, batch 4100, loss[loss=0.2364, simple_loss=0.2928, pruned_loss=0.06621, ctc_loss=0.1188, over 17002.00 frames. ], tot_loss[loss=0.2321, simple_loss=0.2814, pruned_loss=0.0678, ctc_loss=0.1183, over 3299528.21 frames. ], batch size: 258, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:48:05,639 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2747929.3333333335, ans=0.1 2023-10-09 11:48:16,520 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2747976.0, ans=0.125 2023-10-09 11:48:22,414 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=2747976.0, ans=0.2 2023-10-09 11:48:35,870 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.99 vs. limit=15.0 2023-10-09 11:48:54,162 INFO [train.py:1031] (2/4) Epoch 14, batch 4150, loss[loss=0.2016, simple_loss=0.2419, pruned_loss=0.06129, ctc_loss=0.09677, over 16662.00 frames. ], tot_loss[loss=0.2295, simple_loss=0.28, pruned_loss=0.06638, ctc_loss=0.1156, over 3296340.60 frames. ], batch size: 140, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:48:57,963 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.348e+02 3.212e+02 3.640e+02 4.119e+02 7.384e+02, threshold=7.280e+02, percent-clipped=0.0 2023-10-09 11:49:05,966 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2748162.6666666665, ans=0.2 2023-10-09 11:49:17,935 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2748162.6666666665, ans=0.1 2023-10-09 11:49:23,179 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2748209.3333333335, ans=0.1 2023-10-09 11:49:32,319 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2748256.0, ans=0.125 2023-10-09 11:49:45,357 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.38 vs. limit=22.5 2023-10-09 11:49:56,390 INFO [train.py:1031] (2/4) Epoch 14, batch 4200, loss[loss=0.2104, simple_loss=0.2525, pruned_loss=0.06421, ctc_loss=0.09975, over 16762.00 frames. ], tot_loss[loss=0.226, simple_loss=0.2776, pruned_loss=0.06485, ctc_loss=0.1119, over 3283544.08 frames. ], batch size: 130, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:50:12,692 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2748396.0, ans=0.2 2023-10-09 11:50:18,724 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2748396.0, ans=0.0 2023-10-09 11:50:36,576 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 11:50:38,774 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2748489.3333333335, ans=0.0 2023-10-09 11:50:41,183 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.81 vs. limit=15.0 2023-10-09 11:50:50,073 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2748536.0, ans=0.125 2023-10-09 11:50:56,839 INFO [train.py:1031] (2/4) Epoch 14, batch 4250, loss[loss=0.2868, simple_loss=0.3165, pruned_loss=0.09475, ctc_loss=0.1693, over 16844.00 frames. ], tot_loss[loss=0.2277, simple_loss=0.2787, pruned_loss=0.06568, ctc_loss=0.1132, over 3292911.38 frames. ], batch size: 384, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:51:03,482 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.414e+02 3.251e+02 3.808e+02 4.615e+02 8.624e+02, threshold=7.616e+02, percent-clipped=2.0 2023-10-09 11:51:21,787 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2748676.0, ans=0.2 2023-10-09 11:51:30,431 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2748676.0, ans=0.2 2023-10-09 11:51:31,462 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2748676.0, ans=0.0 2023-10-09 11:51:35,383 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2748722.6666666665, ans=10.0 2023-10-09 11:51:58,939 INFO [train.py:1031] (2/4) Epoch 14, batch 4300, loss[loss=0.1235, simple_loss=0.1572, pruned_loss=0.03317, ctc_loss=0.05848, over 13240.00 frames. ], tot_loss[loss=0.2325, simple_loss=0.2853, pruned_loss=0.06675, ctc_loss=0.1156, over 3296455.69 frames. ], batch size: 49, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:52:07,684 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2748816.0, ans=0.1 2023-10-09 11:52:08,784 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2748816.0, ans=0.125 2023-10-09 11:52:21,097 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2748862.6666666665, ans=0.125 2023-10-09 11:52:33,784 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2748909.3333333335, ans=0.125 2023-10-09 11:52:50,693 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2749002.6666666665, ans=0.125 2023-10-09 11:52:54,656 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2749002.6666666665, ans=0.0 2023-10-09 11:53:04,755 INFO [train.py:1031] (2/4) Epoch 14, batch 4350, loss[loss=0.2827, simple_loss=0.3083, pruned_loss=0.09803, ctc_loss=0.1523, over 16860.00 frames. ], tot_loss[loss=0.2425, simple_loss=0.2955, pruned_loss=0.07034, ctc_loss=0.1222, over 3298447.51 frames. ], batch size: 90, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:53:11,362 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.421e+02 3.477e+02 4.126e+02 5.175e+02 8.890e+02, threshold=8.251e+02, percent-clipped=2.0 2023-10-09 11:53:11,640 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2749049.3333333335, ans=0.0 2023-10-09 11:53:17,031 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2749096.0, ans=0.1 2023-10-09 11:53:33,798 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.22 vs. limit=22.5 2023-10-09 11:54:06,537 INFO [train.py:1031] (2/4) Epoch 14, batch 4400, loss[loss=0.1761, simple_loss=0.2357, pruned_loss=0.04451, ctc_loss=0.06868, over 16835.00 frames. ], tot_loss[loss=0.2382, simple_loss=0.2907, pruned_loss=0.06898, ctc_loss=0.1191, over 3299614.05 frames. ], batch size: 176, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:54:15,621 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2749282.6666666665, ans=0.125 2023-10-09 11:54:20,156 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2749329.3333333335, ans=0.125 2023-10-09 11:54:29,808 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2749329.3333333335, ans=0.125 2023-10-09 11:54:37,458 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2749376.0, ans=0.0 2023-10-09 11:55:08,713 INFO [train.py:1031] (2/4) Epoch 14, batch 4450, loss[loss=0.2704, simple_loss=0.3057, pruned_loss=0.08616, ctc_loss=0.1567, over 16828.00 frames. ], tot_loss[loss=0.2327, simple_loss=0.2852, pruned_loss=0.06705, ctc_loss=0.1154, over 3289510.90 frames. ], batch size: 328, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:55:16,085 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.485e+02 3.136e+02 3.606e+02 4.303e+02 6.153e+02, threshold=7.211e+02, percent-clipped=0.0 2023-10-09 11:55:38,522 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2749609.3333333335, ans=0.125 2023-10-09 11:55:38,604 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2749609.3333333335, ans=0.1 2023-10-09 11:55:41,042 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.04 vs. limit=15.0 2023-10-09 11:55:55,758 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.22 vs. limit=6.0 2023-10-09 11:56:09,706 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 11:56:10,353 INFO [train.py:1031] (2/4) Epoch 14, batch 4500, loss[loss=0.2279, simple_loss=0.2816, pruned_loss=0.06456, ctc_loss=0.1129, over 17025.00 frames. ], tot_loss[loss=0.2331, simple_loss=0.2841, pruned_loss=0.06775, ctc_loss=0.1168, over 3296766.04 frames. ], batch size: 216, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:56:22,274 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.56 vs. limit=15.0 2023-10-09 11:57:02,283 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2749936.0, ans=0.1 2023-10-09 11:57:12,368 INFO [train.py:1031] (2/4) Epoch 14, batch 4550, loss[loss=0.2191, simple_loss=0.3008, pruned_loss=0.05097, ctc_loss=0.08875, over 16896.00 frames. ], tot_loss[loss=0.2333, simple_loss=0.2873, pruned_loss=0.06663, ctc_loss=0.1152, over 3297002.10 frames. ], batch size: 243, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 11:57:20,843 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.277e+02 3.133e+02 3.571e+02 4.090e+02 7.081e+02, threshold=7.142e+02, percent-clipped=0.0 2023-10-09 11:57:22,257 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2749982.6666666665, ans=0.125 2023-10-09 11:57:30,227 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.84 vs. limit=15.0 2023-10-09 11:58:09,500 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2750169.3333333335, ans=0.125 2023-10-09 11:58:15,117 INFO [train.py:1031] (2/4) Epoch 14, batch 4600, loss[loss=0.3278, simple_loss=0.3519, pruned_loss=0.1122, ctc_loss=0.1981, over 16677.00 frames. ], tot_loss[loss=0.2364, simple_loss=0.2912, pruned_loss=0.06743, ctc_loss=0.1167, over 3297397.84 frames. ], batch size: 384, lr: 2.59e-03, grad_scale: 8.0 2023-10-09 11:59:08,214 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2750402.6666666665, ans=0.2 2023-10-09 11:59:18,163 INFO [train.py:1031] (2/4) Epoch 14, batch 4650, loss[loss=0.1865, simple_loss=0.26, pruned_loss=0.04201, ctc_loss=0.07252, over 16787.00 frames. ], tot_loss[loss=0.2376, simple_loss=0.2932, pruned_loss=0.06768, ctc_loss=0.1168, over 3298835.61 frames. ], batch size: 164, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 11:59:24,944 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2750449.3333333335, ans=0.125 2023-10-09 11:59:28,531 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.18 vs. limit=12.0 2023-10-09 11:59:28,966 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.652e+02 3.249e+02 3.763e+02 4.381e+02 6.611e+02, threshold=7.526e+02, percent-clipped=0.0 2023-10-09 11:59:42,691 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.45 vs. limit=22.5 2023-10-09 11:59:57,541 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.33 vs. limit=15.0 2023-10-09 12:00:06,879 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2750636.0, ans=0.1 2023-10-09 12:00:08,924 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2750636.0, ans=0.2 2023-10-09 12:00:14,482 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2750636.0, ans=0.125 2023-10-09 12:00:19,919 INFO [train.py:1031] (2/4) Epoch 14, batch 4700, loss[loss=0.2249, simple_loss=0.2761, pruned_loss=0.06563, ctc_loss=0.1062, over 16520.00 frames. ], tot_loss[loss=0.2297, simple_loss=0.2886, pruned_loss=0.0634, ctc_loss=0.1102, over 3283098.08 frames. ], batch size: 110, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:00:28,928 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.44 vs. limit=22.5 2023-10-09 12:00:52,263 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2750776.0, ans=0.09899494936611666 2023-10-09 12:00:52,463 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.19 vs. limit=15.0 2023-10-09 12:01:02,608 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2750822.6666666665, ans=0.125 2023-10-09 12:01:03,489 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2750822.6666666665, ans=0.125 2023-10-09 12:01:19,300 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2750869.3333333335, ans=0.2 2023-10-09 12:01:22,368 INFO [train.py:1031] (2/4) Epoch 14, batch 4750, loss[loss=0.1935, simple_loss=0.2685, pruned_loss=0.04407, ctc_loss=0.07576, over 16691.00 frames. ], tot_loss[loss=0.2311, simple_loss=0.2898, pruned_loss=0.06391, ctc_loss=0.1112, over 3288860.21 frames. ], batch size: 151, lr: 2.59e-03, grad_scale: 1.0 2023-10-09 12:01:22,744 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2750916.0, ans=0.04949747468305833 2023-10-09 12:01:23,719 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2750916.0, ans=0.125 2023-10-09 12:01:35,190 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.335e+02 3.139e+02 3.749e+02 4.382e+02 2.421e+03, threshold=7.497e+02, percent-clipped=2.0 2023-10-09 12:01:40,788 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten.whitening_limit, batch_count=2750962.6666666665, ans=15.0 2023-10-09 12:01:43,248 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:01:47,057 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2751009.3333333335, ans=0.0 2023-10-09 12:01:58,207 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2751056.0, ans=0.125 2023-10-09 12:01:58,698 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.94 vs. limit=15.0 2023-10-09 12:02:22,935 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2751102.6666666665, ans=0.125 2023-10-09 12:02:24,722 INFO [train.py:1031] (2/4) Epoch 14, batch 4800, loss[loss=0.2223, simple_loss=0.2919, pruned_loss=0.05676, ctc_loss=0.09785, over 16800.00 frames. ], tot_loss[loss=0.2293, simple_loss=0.2892, pruned_loss=0.06271, ctc_loss=0.1097, over 3285278.35 frames. ], batch size: 202, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 12:02:38,363 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2751196.0, ans=0.125 2023-10-09 12:02:50,696 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2751242.6666666665, ans=0.1 2023-10-09 12:03:07,579 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=2751289.3333333335, ans=0.02 2023-10-09 12:03:14,077 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2751289.3333333335, ans=0.035 2023-10-09 12:03:19,063 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2751336.0, ans=0.125 2023-10-09 12:03:19,098 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2751336.0, ans=0.2 2023-10-09 12:03:22,815 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2751336.0, ans=0.2 2023-10-09 12:03:27,192 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.31 vs. limit=15.0 2023-10-09 12:03:28,565 INFO [train.py:1031] (2/4) Epoch 14, batch 4850, loss[loss=0.3051, simple_loss=0.3443, pruned_loss=0.09757, ctc_loss=0.177, over 16826.00 frames. ], tot_loss[loss=0.2381, simple_loss=0.2955, pruned_loss=0.06688, ctc_loss=0.1172, over 3294563.01 frames. ], batch size: 329, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 12:03:29,937 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2751382.6666666665, ans=0.1 2023-10-09 12:03:38,165 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2751382.6666666665, ans=0.0 2023-10-09 12:03:40,841 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2751429.3333333335, ans=0.125 2023-10-09 12:03:42,635 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.190e+02 3.280e+02 3.688e+02 4.479e+02 9.310e+02, threshold=7.375e+02, percent-clipped=2.0 2023-10-09 12:04:00,049 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2751476.0, ans=0.0 2023-10-09 12:04:01,003 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2751476.0, ans=0.015 2023-10-09 12:04:19,131 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2751569.3333333335, ans=0.0 2023-10-09 12:04:31,376 INFO [train.py:1031] (2/4) Epoch 14, batch 4900, loss[loss=0.307, simple_loss=0.3527, pruned_loss=0.0953, ctc_loss=0.1767, over 16708.00 frames. ], tot_loss[loss=0.2365, simple_loss=0.2952, pruned_loss=0.06577, ctc_loss=0.1155, over 3289980.15 frames. ], batch size: 384, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 12:04:32,051 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.04 vs. limit=15.0 2023-10-09 12:04:55,313 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:05:26,624 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2751802.6666666665, ans=0.0 2023-10-09 12:05:36,551 INFO [train.py:1031] (2/4) Epoch 14, batch 4950, loss[loss=0.2375, simple_loss=0.2897, pruned_loss=0.06903, ctc_loss=0.1181, over 16827.00 frames. ], tot_loss[loss=0.241, simple_loss=0.2982, pruned_loss=0.06804, ctc_loss=0.1191, over 3283195.05 frames. ], batch size: 228, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 12:05:51,600 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.387e+02 3.251e+02 3.637e+02 4.222e+02 8.685e+02, threshold=7.275e+02, percent-clipped=2.0 2023-10-09 12:05:55,766 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2751896.0, ans=0.035 2023-10-09 12:05:55,834 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2751896.0, ans=0.125 2023-10-09 12:06:09,612 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.09 vs. limit=15.0 2023-10-09 12:06:17,334 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2751989.3333333335, ans=0.0 2023-10-09 12:06:31,762 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2752036.0, ans=0.125 2023-10-09 12:06:39,357 INFO [train.py:1031] (2/4) Epoch 14, batch 5000, loss[loss=0.2188, simple_loss=0.2664, pruned_loss=0.06385, ctc_loss=0.1089, over 16757.00 frames. ], tot_loss[loss=0.2383, simple_loss=0.2926, pruned_loss=0.06824, ctc_loss=0.1189, over 3289989.59 frames. ], batch size: 130, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:06:41,247 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.75 vs. limit=22.5 2023-10-09 12:06:47,211 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2752082.6666666665, ans=0.1 2023-10-09 12:07:08,569 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2752176.0, ans=0.0 2023-10-09 12:07:09,760 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.25 vs. limit=15.0 2023-10-09 12:07:14,806 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2752222.6666666665, ans=0.125 2023-10-09 12:07:15,363 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.25 vs. limit=22.5 2023-10-09 12:07:19,148 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2752222.6666666665, ans=0.2 2023-10-09 12:07:20,693 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.13 vs. limit=10.0 2023-10-09 12:07:27,660 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2752269.3333333335, ans=0.0 2023-10-09 12:07:41,589 INFO [train.py:1031] (2/4) Epoch 14, batch 5050, loss[loss=0.2335, simple_loss=0.3095, pruned_loss=0.05611, ctc_loss=0.1132, over 16843.00 frames. ], tot_loss[loss=0.2335, simple_loss=0.287, pruned_loss=0.06671, ctc_loss=0.1163, over 3291540.73 frames. ], batch size: 292, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:07:42,312 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.09 vs. limit=15.0 2023-10-09 12:07:42,951 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2752316.0, ans=0.125 2023-10-09 12:07:54,038 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2752362.6666666665, ans=0.125 2023-10-09 12:07:56,489 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.463e+02 3.308e+02 3.761e+02 4.513e+02 1.207e+03, threshold=7.522e+02, percent-clipped=1.0 2023-10-09 12:08:02,106 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2752362.6666666665, ans=0.125 2023-10-09 12:08:03,339 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2752362.6666666665, ans=0.125 2023-10-09 12:08:05,023 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2752409.3333333335, ans=0.125 2023-10-09 12:08:11,752 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2752409.3333333335, ans=0.5 2023-10-09 12:08:12,785 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2752409.3333333335, ans=0.0 2023-10-09 12:08:14,223 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.whiten.whitening_limit, batch_count=2752409.3333333335, ans=15.0 2023-10-09 12:08:21,385 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2752456.0, ans=0.125 2023-10-09 12:08:42,500 INFO [train.py:1031] (2/4) Epoch 14, batch 5100, loss[loss=0.2615, simple_loss=0.3274, pruned_loss=0.07304, ctc_loss=0.1237, over 16817.00 frames. ], tot_loss[loss=0.234, simple_loss=0.2907, pruned_loss=0.06571, ctc_loss=0.115, over 3290029.51 frames. ], batch size: 102, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:08:55,620 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2752596.0, ans=0.125 2023-10-09 12:08:56,676 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2752596.0, ans=0.0 2023-10-09 12:09:00,032 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2752596.0, ans=0.0 2023-10-09 12:09:18,530 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2752689.3333333335, ans=0.125 2023-10-09 12:09:24,041 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=2752689.3333333335, ans=0.02 2023-10-09 12:09:41,619 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2752736.0, ans=0.125 2023-10-09 12:09:43,373 INFO [train.py:1031] (2/4) Epoch 14, batch 5150, loss[loss=0.2542, simple_loss=0.3018, pruned_loss=0.07524, ctc_loss=0.1403, over 16955.00 frames. ], tot_loss[loss=0.2374, simple_loss=0.2914, pruned_loss=0.06793, ctc_loss=0.1189, over 3297728.47 frames. ], batch size: 309, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 12:09:50,288 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2752782.6666666665, ans=0.1 2023-10-09 12:09:51,419 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2752782.6666666665, ans=0.1 2023-10-09 12:09:58,676 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2752829.3333333335, ans=0.125 2023-10-09 12:10:00,361 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.450e+02 3.267e+02 3.761e+02 4.649e+02 7.424e+02, threshold=7.522e+02, percent-clipped=0.0 2023-10-09 12:10:11,011 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2752876.0, ans=0.1 2023-10-09 12:10:26,281 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2752922.6666666665, ans=0.0 2023-10-09 12:10:45,583 INFO [train.py:1031] (2/4) Epoch 14, batch 5200, loss[loss=0.2025, simple_loss=0.266, pruned_loss=0.05179, ctc_loss=0.08846, over 16825.00 frames. ], tot_loss[loss=0.2364, simple_loss=0.2911, pruned_loss=0.06722, ctc_loss=0.1179, over 3300410.18 frames. ], batch size: 164, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:11:01,022 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2753062.6666666665, ans=0.1 2023-10-09 12:11:04,951 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2753062.6666666665, ans=0.125 2023-10-09 12:11:05,915 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2753062.6666666665, ans=0.125 2023-10-09 12:11:11,179 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2753109.3333333335, ans=0.125 2023-10-09 12:11:14,319 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2753109.3333333335, ans=0.125 2023-10-09 12:11:37,135 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:11:47,487 INFO [train.py:1031] (2/4) Epoch 14, batch 5250, loss[loss=0.2131, simple_loss=0.2632, pruned_loss=0.06135, ctc_loss=0.1005, over 16933.00 frames. ], tot_loss[loss=0.2307, simple_loss=0.284, pruned_loss=0.06577, ctc_loss=0.115, over 3297742.82 frames. ], batch size: 78, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 12:11:47,776 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2753249.3333333335, ans=0.0 2023-10-09 12:11:47,858 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2753249.3333333335, ans=0.125 2023-10-09 12:11:50,052 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2753249.3333333335, ans=0.125 2023-10-09 12:11:54,596 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2753249.3333333335, ans=0.1 2023-10-09 12:11:56,763 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2753249.3333333335, ans=0.125 2023-10-09 12:12:05,608 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.302e+02 2.914e+02 3.261e+02 3.772e+02 6.960e+02, threshold=6.522e+02, percent-clipped=0.0 2023-10-09 12:12:14,919 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2753342.6666666665, ans=0.125 2023-10-09 12:12:36,467 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:12:46,810 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2753436.0, ans=0.125 2023-10-09 12:12:49,310 INFO [train.py:1031] (2/4) Epoch 14, batch 5300, loss[loss=0.3612, simple_loss=0.4148, pruned_loss=0.1131, ctc_loss=0.2034, over 16432.00 frames. ], tot_loss[loss=0.2383, simple_loss=0.2906, pruned_loss=0.06892, ctc_loss=0.1202, over 3293090.58 frames. ], batch size: 415, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:12:52,489 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:12:57,348 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2753482.6666666665, ans=0.125 2023-10-09 12:12:58,413 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2753482.6666666665, ans=0.125 2023-10-09 12:13:10,151 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2753529.3333333335, ans=0.0 2023-10-09 12:13:13,358 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2753576.0, ans=0.125 2023-10-09 12:13:41,136 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2753669.3333333335, ans=0.0 2023-10-09 12:13:44,247 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2753669.3333333335, ans=0.125 2023-10-09 12:13:51,898 INFO [train.py:1031] (2/4) Epoch 14, batch 5350, loss[loss=0.2543, simple_loss=0.3146, pruned_loss=0.07296, ctc_loss=0.1201, over 16159.00 frames. ], tot_loss[loss=0.2489, simple_loss=0.3019, pruned_loss=0.07258, ctc_loss=0.1267, over 3288081.91 frames. ], batch size: 463, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 12:14:08,418 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2753762.6666666665, ans=0.125 2023-10-09 12:14:12,278 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.795e+02 3.646e+02 4.307e+02 5.553e+02 1.031e+03, threshold=8.614e+02, percent-clipped=13.0 2023-10-09 12:14:19,952 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2753809.3333333335, ans=0.125 2023-10-09 12:14:51,581 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2753902.6666666665, ans=0.125 2023-10-09 12:14:54,877 INFO [train.py:1031] (2/4) Epoch 14, batch 5400, loss[loss=0.2702, simple_loss=0.3073, pruned_loss=0.08661, ctc_loss=0.1498, over 16777.00 frames. ], tot_loss[loss=0.25, simple_loss=0.3029, pruned_loss=0.07302, ctc_loss=0.1276, over 3293865.79 frames. ], batch size: 130, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 12:14:59,407 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2753949.3333333335, ans=0.0 2023-10-09 12:15:11,880 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2753996.0, ans=0.125 2023-10-09 12:15:19,518 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=2754042.6666666665, ans=0.05 2023-10-09 12:15:34,352 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2754089.3333333335, ans=0.1 2023-10-09 12:15:52,761 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2754136.0, ans=0.125 2023-10-09 12:15:55,501 INFO [train.py:1031] (2/4) Epoch 14, batch 5450, loss[loss=0.206, simple_loss=0.2627, pruned_loss=0.05558, ctc_loss=0.09532, over 16715.00 frames. ], tot_loss[loss=0.2424, simple_loss=0.2942, pruned_loss=0.07061, ctc_loss=0.1234, over 3301042.73 frames. ], batch size: 95, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 12:15:59,045 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2754182.6666666665, ans=0.125 2023-10-09 12:16:11,175 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2754229.3333333335, ans=0.95 2023-10-09 12:16:16,183 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.414e+02 3.013e+02 3.420e+02 3.920e+02 8.304e+02, threshold=6.840e+02, percent-clipped=0.0 2023-10-09 12:16:18,612 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2754229.3333333335, ans=0.125 2023-10-09 12:16:23,923 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2754276.0, ans=0.125 2023-10-09 12:16:25,893 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2754276.0, ans=0.125 2023-10-09 12:16:27,533 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.18 vs. limit=15.0 2023-10-09 12:16:53,739 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2754369.3333333335, ans=0.125 2023-10-09 12:16:57,618 INFO [train.py:1031] (2/4) Epoch 14, batch 5500, loss[loss=0.1868, simple_loss=0.239, pruned_loss=0.05059, ctc_loss=0.08334, over 16940.00 frames. ], tot_loss[loss=0.2395, simple_loss=0.2897, pruned_loss=0.07018, ctc_loss=0.1221, over 3299610.09 frames. ], batch size: 82, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:16:59,059 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2754416.0, ans=0.125 2023-10-09 12:17:14,058 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.57 vs. limit=15.0 2023-10-09 12:17:16,457 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2754462.6666666665, ans=0.125 2023-10-09 12:17:16,739 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.30 vs. limit=15.0 2023-10-09 12:17:19,248 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2754462.6666666665, ans=0.125 2023-10-09 12:17:25,887 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2754509.3333333335, ans=0.125 2023-10-09 12:17:45,199 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2754602.6666666665, ans=0.05 2023-10-09 12:17:50,792 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2754602.6666666665, ans=0.125 2023-10-09 12:17:58,544 INFO [train.py:1031] (2/4) Epoch 14, batch 5550, loss[loss=0.1687, simple_loss=0.2396, pruned_loss=0.03588, ctc_loss=0.06522, over 16698.00 frames. ], tot_loss[loss=0.2396, simple_loss=0.2905, pruned_loss=0.06998, ctc_loss=0.1218, over 3294226.44 frames. ], batch size: 151, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:18:05,394 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2754649.3333333335, ans=0.0 2023-10-09 12:18:16,479 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2754696.0, ans=0.05 2023-10-09 12:18:18,621 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.568e+02 3.038e+02 3.521e+02 4.365e+02 6.662e+02, threshold=7.043e+02, percent-clipped=0.0 2023-10-09 12:18:28,083 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2754742.6666666665, ans=0.04949747468305833 2023-10-09 12:18:39,603 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.48 vs. limit=15.0 2023-10-09 12:18:40,676 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.95 vs. limit=15.0 2023-10-09 12:18:51,933 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.92 vs. limit=12.0 2023-10-09 12:18:59,756 INFO [train.py:1031] (2/4) Epoch 14, batch 5600, loss[loss=0.3097, simple_loss=0.3257, pruned_loss=0.1072, ctc_loss=0.1985, over 16843.00 frames. ], tot_loss[loss=0.2353, simple_loss=0.2886, pruned_loss=0.06745, ctc_loss=0.1181, over 3302071.06 frames. ], batch size: 384, lr: 2.59e-03, grad_scale: 8.0 2023-10-09 12:19:13,398 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2754929.3333333335, ans=0.0 2023-10-09 12:19:17,122 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2754929.3333333335, ans=0.0 2023-10-09 12:19:26,687 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:19:46,137 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2755022.6666666665, ans=0.0 2023-10-09 12:19:47,206 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2755069.3333333335, ans=0.125 2023-10-09 12:19:50,179 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.44 vs. limit=10.0 2023-10-09 12:19:50,285 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.32 vs. limit=6.0 2023-10-09 12:19:57,492 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2755069.3333333335, ans=0.125 2023-10-09 12:20:00,156 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2755116.0, ans=0.07 2023-10-09 12:20:00,858 INFO [train.py:1031] (2/4) Epoch 14, batch 5650, loss[loss=0.2434, simple_loss=0.295, pruned_loss=0.07038, ctc_loss=0.1276, over 16798.00 frames. ], tot_loss[loss=0.233, simple_loss=0.2864, pruned_loss=0.06648, ctc_loss=0.1168, over 3297411.47 frames. ], batch size: 188, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:20:13,224 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2755162.6666666665, ans=0.09899494936611666 2023-10-09 12:20:16,775 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2755162.6666666665, ans=0.1 2023-10-09 12:20:19,925 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2755162.6666666665, ans=0.125 2023-10-09 12:20:22,588 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.241e+02 3.073e+02 3.464e+02 4.035e+02 6.010e+02, threshold=6.928e+02, percent-clipped=0.0 2023-10-09 12:20:34,385 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2755209.3333333335, ans=0.2 2023-10-09 12:20:54,183 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2755302.6666666665, ans=0.1 2023-10-09 12:21:01,266 INFO [train.py:1031] (2/4) Epoch 14, batch 5700, loss[loss=0.1956, simple_loss=0.2644, pruned_loss=0.04655, ctc_loss=0.0841, over 17041.00 frames. ], tot_loss[loss=0.2333, simple_loss=0.2853, pruned_loss=0.06707, ctc_loss=0.1179, over 3307188.87 frames. ], batch size: 259, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:21:04,369 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2755349.3333333335, ans=0.125 2023-10-09 12:21:11,944 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2755349.3333333335, ans=0.1 2023-10-09 12:21:13,768 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2755396.0, ans=0.125 2023-10-09 12:21:27,840 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.50 vs. limit=15.0 2023-10-09 12:21:44,919 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.24 vs. limit=12.0 2023-10-09 12:21:45,941 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.15 vs. limit=15.0 2023-10-09 12:22:04,412 INFO [train.py:1031] (2/4) Epoch 14, batch 5750, loss[loss=0.244, simple_loss=0.3028, pruned_loss=0.06875, ctc_loss=0.1192, over 16743.00 frames. ], tot_loss[loss=0.2272, simple_loss=0.2813, pruned_loss=0.06399, ctc_loss=0.113, over 3309840.33 frames. ], batch size: 102, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:22:05,785 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2755582.6666666665, ans=0.125 2023-10-09 12:22:09,614 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2755582.6666666665, ans=0.125 2023-10-09 12:22:28,327 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.248e+02 3.061e+02 3.546e+02 4.294e+02 7.342e+02, threshold=7.092e+02, percent-clipped=2.0 2023-10-09 12:22:28,742 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2755676.0, ans=0.125 2023-10-09 12:22:30,317 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.51 vs. limit=10.0 2023-10-09 12:22:38,587 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2755676.0, ans=0.125 2023-10-09 12:22:48,442 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2755722.6666666665, ans=0.0 2023-10-09 12:22:59,518 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2755769.3333333335, ans=0.0 2023-10-09 12:23:07,365 INFO [train.py:1031] (2/4) Epoch 14, batch 5800, loss[loss=0.2585, simple_loss=0.3013, pruned_loss=0.07904, ctc_loss=0.144, over 16766.00 frames. ], tot_loss[loss=0.2314, simple_loss=0.284, pruned_loss=0.06608, ctc_loss=0.1166, over 3294964.45 frames. ], batch size: 273, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:23:12,920 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2755816.0, ans=0.1 2023-10-09 12:23:21,612 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2755862.6666666665, ans=0.125 2023-10-09 12:23:40,575 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2755909.3333333335, ans=0.125 2023-10-09 12:23:56,314 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2756002.6666666665, ans=0.0 2023-10-09 12:24:06,437 INFO [train.py:1031] (2/4) Epoch 14, batch 5850, loss[loss=0.2102, simple_loss=0.2495, pruned_loss=0.0643, ctc_loss=0.1057, over 16718.00 frames. ], tot_loss[loss=0.2294, simple_loss=0.28, pruned_loss=0.06612, ctc_loss=0.1166, over 3299229.69 frames. ], batch size: 130, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:24:23,351 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2756096.0, ans=0.125 2023-10-09 12:24:31,817 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.338e+02 3.138e+02 3.574e+02 4.171e+02 9.183e+02, threshold=7.147e+02, percent-clipped=2.0 2023-10-09 12:24:49,841 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2756189.3333333335, ans=0.125 2023-10-09 12:25:05,812 INFO [train.py:1031] (2/4) Epoch 14, batch 5900, loss[loss=0.2304, simple_loss=0.2789, pruned_loss=0.06836, ctc_loss=0.1131, over 16983.00 frames. ], tot_loss[loss=0.2275, simple_loss=0.2762, pruned_loss=0.06613, ctc_loss=0.1163, over 3307125.51 frames. ], batch size: 86, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:25:31,365 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.73 vs. limit=15.0 2023-10-09 12:25:34,018 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.50 vs. limit=6.0 2023-10-09 12:25:35,774 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2756376.0, ans=0.125 2023-10-09 12:25:37,944 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2756376.0, ans=0.1 2023-10-09 12:25:45,533 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2756422.6666666665, ans=0.125 2023-10-09 12:26:06,385 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.14 vs. limit=15.0 2023-10-09 12:26:06,819 INFO [train.py:1031] (2/4) Epoch 14, batch 5950, loss[loss=0.2373, simple_loss=0.2837, pruned_loss=0.07149, ctc_loss=0.1198, over 16849.00 frames. ], tot_loss[loss=0.2269, simple_loss=0.2753, pruned_loss=0.0661, ctc_loss=0.1158, over 3313225.86 frames. ], batch size: 176, lr: 2.59e-03, grad_scale: 4.0 2023-10-09 12:26:14,180 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2756516.0, ans=0.0 2023-10-09 12:26:15,157 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2756516.0, ans=0.5 2023-10-09 12:26:16,295 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2756516.0, ans=0.125 2023-10-09 12:26:31,791 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.294e+02 3.192e+02 3.462e+02 4.093e+02 6.652e+02, threshold=6.925e+02, percent-clipped=0.0 2023-10-09 12:26:50,543 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.15 vs. limit=15.0 2023-10-09 12:27:06,814 INFO [train.py:1031] (2/4) Epoch 14, batch 6000, loss[loss=0.2053, simple_loss=0.2745, pruned_loss=0.05009, ctc_loss=0.09004, over 16956.00 frames. ], tot_loss[loss=0.2284, simple_loss=0.2815, pruned_loss=0.06485, ctc_loss=0.1142, over 3312813.80 frames. ], batch size: 216, lr: 2.59e-03, grad_scale: 8.0 2023-10-09 12:27:06,814 INFO [train.py:1054] (2/4) Computing validation loss 2023-10-09 12:27:23,519 INFO [train.py:1063] (2/4) Epoch 14, validation: loss=0.2297, simple_loss=0.3012, pruned_loss=0.0607, ctc_loss=0.09172, over 1796401.00 frames. 2023-10-09 12:27:23,520 INFO [train.py:1064] (2/4) Maximum memory allocated so far is 14591MB 2023-10-09 12:27:31,550 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:27:42,472 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2756796.0, ans=0.125 2023-10-09 12:27:47,330 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=2756842.6666666665, ans=0.2 2023-10-09 12:27:49,421 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2756842.6666666665, ans=0.2 2023-10-09 12:28:14,590 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2756936.0, ans=0.125 2023-10-09 12:28:19,463 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2756936.0, ans=0.1 2023-10-09 12:28:23,963 INFO [train.py:1031] (2/4) Epoch 14, batch 6050, loss[loss=0.2011, simple_loss=0.2396, pruned_loss=0.05955, ctc_loss=0.1087, over 16363.00 frames. ], tot_loss[loss=0.2204, simple_loss=0.2758, pruned_loss=0.06098, ctc_loss=0.1079, over 3315077.59 frames. ], batch size: 416, lr: 2.59e-03, grad_scale: 2.0 2023-10-09 12:28:28,101 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2756982.6666666665, ans=0.125 2023-10-09 12:28:38,488 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2757029.3333333335, ans=0.125 2023-10-09 12:28:46,750 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2757029.3333333335, ans=0.5 2023-10-09 12:28:52,470 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.212e+02 2.832e+02 3.391e+02 4.153e+02 6.756e+02, threshold=6.782e+02, percent-clipped=0.0 2023-10-09 12:28:53,055 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.69 vs. limit=15.0 2023-10-09 12:29:05,686 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.36 vs. limit=15.0 2023-10-09 12:29:05,727 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.28 vs. limit=15.0 2023-10-09 12:29:07,900 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2757122.6666666665, ans=0.125 2023-10-09 12:29:07,980 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2757122.6666666665, ans=0.125 2023-10-09 12:29:22,843 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2757216.0, ans=0.0 2023-10-09 12:29:23,146 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.65 vs. limit=15.0 2023-10-09 12:29:24,166 INFO [train.py:1031] (2/4) Epoch 14, batch 6100, loss[loss=0.2152, simple_loss=0.2672, pruned_loss=0.06086, ctc_loss=0.1036, over 16734.00 frames. ], tot_loss[loss=0.2171, simple_loss=0.2711, pruned_loss=0.06022, ctc_loss=0.1066, over 3315224.75 frames. ], batch size: 130, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:29:28,647 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.64 vs. limit=15.0 2023-10-09 12:29:33,715 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2757216.0, ans=0.0 2023-10-09 12:29:40,603 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2757262.6666666665, ans=0.0 2023-10-09 12:29:47,730 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2757262.6666666665, ans=0.1 2023-10-09 12:29:49,846 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2757309.3333333335, ans=0.125 2023-10-09 12:29:52,980 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2757309.3333333335, ans=0.0 2023-10-09 12:29:55,745 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2757309.3333333335, ans=0.125 2023-10-09 12:29:56,707 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2757309.3333333335, ans=0.125 2023-10-09 12:30:00,405 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2757356.0, ans=0.035 2023-10-09 12:30:03,208 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2757356.0, ans=0.0 2023-10-09 12:30:07,364 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2757356.0, ans=0.0 2023-10-09 12:30:19,111 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2757402.6666666665, ans=0.125 2023-10-09 12:30:25,861 INFO [train.py:1031] (2/4) Epoch 14, batch 6150, loss[loss=0.202, simple_loss=0.2816, pruned_loss=0.0457, ctc_loss=0.07781, over 16770.00 frames. ], tot_loss[loss=0.2168, simple_loss=0.2749, pruned_loss=0.05855, ctc_loss=0.104, over 3314512.90 frames. ], batch size: 188, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:30:42,051 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2757496.0, ans=0.1 2023-10-09 12:30:43,254 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:30:50,913 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2757542.6666666665, ans=0.125 2023-10-09 12:30:56,169 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2023-10-09 12:30:56,529 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.213e+02 2.946e+02 3.372e+02 3.986e+02 9.785e+02, threshold=6.744e+02, percent-clipped=2.0 2023-10-09 12:30:56,893 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2757542.6666666665, ans=0.0 2023-10-09 12:31:10,042 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2757589.3333333335, ans=0.2 2023-10-09 12:31:26,106 INFO [train.py:1031] (2/4) Epoch 14, batch 6200, loss[loss=0.2206, simple_loss=0.2808, pruned_loss=0.05989, ctc_loss=0.1014, over 16937.00 frames. ], tot_loss[loss=0.2209, simple_loss=0.2776, pruned_loss=0.06061, ctc_loss=0.1072, over 3316725.56 frames. ], batch size: 243, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:31:26,490 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2757682.6666666665, ans=0.0 2023-10-09 12:31:29,194 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2757682.6666666665, ans=0.07 2023-10-09 12:31:36,804 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2757682.6666666665, ans=0.125 2023-10-09 12:31:37,944 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2757729.3333333335, ans=0.0 2023-10-09 12:31:59,264 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2757776.0, ans=0.125 2023-10-09 12:32:02,696 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2757822.6666666665, ans=0.1 2023-10-09 12:32:03,745 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2757822.6666666665, ans=0.0 2023-10-09 12:32:11,112 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2757822.6666666665, ans=0.125 2023-10-09 12:32:13,339 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2757869.3333333335, ans=0.125 2023-10-09 12:32:17,784 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2757869.3333333335, ans=0.0 2023-10-09 12:32:18,715 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2757869.3333333335, ans=0.2 2023-10-09 12:32:26,028 INFO [train.py:1031] (2/4) Epoch 14, batch 6250, loss[loss=0.205, simple_loss=0.2813, pruned_loss=0.04798, ctc_loss=0.0816, over 16841.00 frames. ], tot_loss[loss=0.2203, simple_loss=0.2782, pruned_loss=0.05996, ctc_loss=0.1061, over 3321959.06 frames. ], batch size: 202, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:32:55,891 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.267e+02 2.973e+02 3.416e+02 4.008e+02 8.801e+02, threshold=6.831e+02, percent-clipped=1.0 2023-10-09 12:33:07,330 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2758056.0, ans=0.125 2023-10-09 12:33:21,323 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:33:26,291 INFO [train.py:1031] (2/4) Epoch 14, batch 6300, loss[loss=0.1901, simple_loss=0.2602, pruned_loss=0.04522, ctc_loss=0.07403, over 16874.00 frames. ], tot_loss[loss=0.2158, simple_loss=0.277, pruned_loss=0.05705, ctc_loss=0.101, over 3303085.83 frames. ], batch size: 90, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:33:45,018 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2758196.0, ans=0.125 2023-10-09 12:33:46,755 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2758196.0, ans=0.1 2023-10-09 12:33:52,285 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2758242.6666666665, ans=0.125 2023-10-09 12:33:57,538 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2758242.6666666665, ans=0.1 2023-10-09 12:34:07,225 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2758289.3333333335, ans=0.1 2023-10-09 12:34:25,749 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2758336.0, ans=0.2 2023-10-09 12:34:28,479 INFO [train.py:1031] (2/4) Epoch 14, batch 6350, loss[loss=0.1919, simple_loss=0.2578, pruned_loss=0.04727, ctc_loss=0.07891, over 16920.00 frames. ], tot_loss[loss=0.2177, simple_loss=0.278, pruned_loss=0.05815, ctc_loss=0.1028, over 3298853.58 frames. ], batch size: 229, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:34:42,184 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.61 vs. limit=22.5 2023-10-09 12:35:00,733 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+02 3.028e+02 3.570e+02 4.973e+02 1.101e+03, threshold=7.141e+02, percent-clipped=8.0 2023-10-09 12:35:09,015 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2758522.6666666665, ans=0.0 2023-10-09 12:35:10,156 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2758522.6666666665, ans=0.1 2023-10-09 12:35:32,009 INFO [train.py:1031] (2/4) Epoch 14, batch 6400, loss[loss=0.2635, simple_loss=0.3838, pruned_loss=0.05309, ctc_loss=0.09237, over 16284.00 frames. ], tot_loss[loss=0.222, simple_loss=0.2837, pruned_loss=0.05916, ctc_loss=0.1048, over 3293155.54 frames. ], batch size: 463, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:35:45,589 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2758662.6666666665, ans=0.0 2023-10-09 12:35:48,014 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.41 vs. limit=15.0 2023-10-09 12:35:53,042 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2758662.6666666665, ans=0.0 2023-10-09 12:36:22,604 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:36:34,417 INFO [train.py:1031] (2/4) Epoch 14, batch 6450, loss[loss=0.2761, simple_loss=0.3588, pruned_loss=0.07014, ctc_loss=0.1328, over 15128.00 frames. ], tot_loss[loss=0.2293, simple_loss=0.2927, pruned_loss=0.06124, ctc_loss=0.1085, over 3289822.69 frames. ], batch size: 527, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:36:44,672 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.38 vs. limit=15.0 2023-10-09 12:36:49,116 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.65 vs. limit=22.5 2023-10-09 12:36:54,385 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.74 vs. limit=15.0 2023-10-09 12:37:09,122 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.565e+02 3.527e+02 4.142e+02 5.294e+02 1.315e+03, threshold=8.284e+02, percent-clipped=10.0 2023-10-09 12:37:12,640 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.89 vs. limit=15.0 2023-10-09 12:37:13,382 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2758989.3333333335, ans=0.1 2023-10-09 12:37:17,267 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2758989.3333333335, ans=0.2 2023-10-09 12:37:17,304 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2758989.3333333335, ans=0.125 2023-10-09 12:37:22,031 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2758989.3333333335, ans=0.1 2023-10-09 12:37:33,338 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.94 vs. limit=15.0 2023-10-09 12:37:35,851 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.69 vs. limit=10.0 2023-10-09 12:37:37,500 INFO [train.py:1031] (2/4) Epoch 14, batch 6500, loss[loss=0.2171, simple_loss=0.2984, pruned_loss=0.04896, ctc_loss=0.09493, over 16910.00 frames. ], tot_loss[loss=0.2327, simple_loss=0.2963, pruned_loss=0.0625, ctc_loss=0.1103, over 3293846.03 frames. ], batch size: 258, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 12:37:38,560 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2759082.6666666665, ans=0.09899494936611666 2023-10-09 12:37:39,531 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2759082.6666666665, ans=0.0 2023-10-09 12:37:45,521 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2759082.6666666665, ans=0.0 2023-10-09 12:37:58,983 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2759129.3333333335, ans=0.1 2023-10-09 12:38:15,810 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.12 vs. limit=12.0 2023-10-09 12:38:39,316 INFO [train.py:1031] (2/4) Epoch 14, batch 6550, loss[loss=0.2132, simple_loss=0.3229, pruned_loss=0.03823, ctc_loss=0.06781, over 15169.00 frames. ], tot_loss[loss=0.2305, simple_loss=0.2965, pruned_loss=0.06076, ctc_loss=0.1073, over 3294567.71 frames. ], batch size: 526, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:38:40,645 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2759316.0, ans=0.125 2023-10-09 12:38:43,540 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2759316.0, ans=0.05 2023-10-09 12:38:56,207 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.17 vs. limit=15.0 2023-10-09 12:38:58,674 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2759362.6666666665, ans=0.125 2023-10-09 12:39:06,330 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2023-10-09 12:39:13,987 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.392e+02 3.110e+02 3.501e+02 4.783e+02 9.305e+02, threshold=7.003e+02, percent-clipped=1.0 2023-10-09 12:39:14,442 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2759409.3333333335, ans=0.2 2023-10-09 12:39:19,445 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.70 vs. limit=15.0 2023-10-09 12:39:41,348 INFO [train.py:1031] (2/4) Epoch 14, batch 6600, loss[loss=0.2349, simple_loss=0.2859, pruned_loss=0.06734, ctc_loss=0.1233, over 16754.00 frames. ], tot_loss[loss=0.2298, simple_loss=0.2948, pruned_loss=0.06088, ctc_loss=0.1074, over 3280106.84 frames. ], batch size: 272, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 12:40:28,267 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2759689.3333333335, ans=0.1 2023-10-09 12:40:43,321 INFO [train.py:1031] (2/4) Epoch 14, batch 6650, loss[loss=0.2529, simple_loss=0.2886, pruned_loss=0.08009, ctc_loss=0.1427, over 16442.00 frames. ], tot_loss[loss=0.2258, simple_loss=0.2872, pruned_loss=0.06082, ctc_loss=0.1068, over 3280580.92 frames. ], batch size: 417, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:40:44,611 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2759782.6666666665, ans=0.125 2023-10-09 12:40:46,128 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.49 vs. limit=22.5 2023-10-09 12:40:56,000 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2759829.3333333335, ans=0.125 2023-10-09 12:41:14,905 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:41:18,757 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.417e+02 3.057e+02 3.351e+02 3.889e+02 6.888e+02, threshold=6.703e+02, percent-clipped=0.0 2023-10-09 12:41:19,285 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2759922.6666666665, ans=0.125 2023-10-09 12:41:22,686 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2759922.6666666665, ans=0.125 2023-10-09 12:41:45,278 INFO [train.py:1031] (2/4) Epoch 14, batch 6700, loss[loss=0.24, simple_loss=0.3077, pruned_loss=0.06348, ctc_loss=0.1132, over 16793.00 frames. ], tot_loss[loss=0.2278, simple_loss=0.2904, pruned_loss=0.06106, ctc_loss=0.1078, over 3295764.63 frames. ], batch size: 188, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 12:41:55,560 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2760016.0, ans=0.04949747468305833 2023-10-09 12:41:55,806 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.99 vs. limit=15.0 2023-10-09 12:42:03,060 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.10 vs. limit=10.0 2023-10-09 12:42:12,114 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2760109.3333333335, ans=0.0 2023-10-09 12:42:13,129 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2760109.3333333335, ans=0.0 2023-10-09 12:42:16,424 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2760109.3333333335, ans=0.1 2023-10-09 12:42:18,318 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.45 vs. limit=15.0 2023-10-09 12:42:23,725 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2760156.0, ans=0.125 2023-10-09 12:42:33,715 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.05 vs. limit=22.5 2023-10-09 12:42:35,186 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.89 vs. limit=15.0 2023-10-09 12:42:44,222 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.62 vs. limit=22.5 2023-10-09 12:42:48,658 INFO [train.py:1031] (2/4) Epoch 14, batch 6750, loss[loss=0.2946, simple_loss=0.3549, pruned_loss=0.08605, ctc_loss=0.1555, over 16873.00 frames. ], tot_loss[loss=0.2403, simple_loss=0.3036, pruned_loss=0.06525, ctc_loss=0.1162, over 3304888.73 frames. ], batch size: 242, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 12:43:00,192 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.06 vs. limit=15.0 2023-10-09 12:43:25,356 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.299e+02 3.277e+02 3.937e+02 4.777e+02 6.969e+02, threshold=7.873e+02, percent-clipped=1.0 2023-10-09 12:43:32,763 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2760389.3333333335, ans=0.5 2023-10-09 12:43:49,734 INFO [train.py:1031] (2/4) Epoch 14, batch 6800, loss[loss=0.2263, simple_loss=0.293, pruned_loss=0.06, ctc_loss=0.09906, over 17077.00 frames. ], tot_loss[loss=0.2406, simple_loss=0.3018, pruned_loss=0.06615, ctc_loss=0.1175, over 3311901.06 frames. ], batch size: 83, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 12:43:55,593 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.99 vs. limit=22.5 2023-10-09 12:43:55,651 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.45 vs. limit=15.0 2023-10-09 12:44:00,747 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2760529.3333333335, ans=0.125 2023-10-09 12:44:03,459 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2760529.3333333335, ans=0.125 2023-10-09 12:44:03,500 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2760529.3333333335, ans=0.07 2023-10-09 12:44:07,890 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2760529.3333333335, ans=0.125 2023-10-09 12:44:18,675 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2760576.0, ans=0.0 2023-10-09 12:44:39,818 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2760669.3333333335, ans=0.0 2023-10-09 12:44:46,779 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2760669.3333333335, ans=0.0 2023-10-09 12:44:51,408 INFO [train.py:1031] (2/4) Epoch 14, batch 6850, loss[loss=0.2577, simple_loss=0.3083, pruned_loss=0.07485, ctc_loss=0.1436, over 16789.00 frames. ], tot_loss[loss=0.2383, simple_loss=0.3002, pruned_loss=0.0651, ctc_loss=0.1158, over 3319233.44 frames. ], batch size: 188, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 12:45:06,318 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2760762.6666666665, ans=0.1 2023-10-09 12:45:08,470 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.62 vs. limit=15.0 2023-10-09 12:45:14,810 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2760809.3333333335, ans=0.0 2023-10-09 12:45:28,869 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.210e+02 3.178e+02 3.827e+02 4.505e+02 1.079e+03, threshold=7.655e+02, percent-clipped=2.0 2023-10-09 12:45:43,212 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2760902.6666666665, ans=0.0 2023-10-09 12:45:54,910 INFO [train.py:1031] (2/4) Epoch 14, batch 6900, loss[loss=0.2588, simple_loss=0.311, pruned_loss=0.07754, ctc_loss=0.1287, over 16810.00 frames. ], tot_loss[loss=0.2402, simple_loss=0.3016, pruned_loss=0.06599, ctc_loss=0.117, over 3316288.47 frames. ], batch size: 164, lr: 2.58e-03, grad_scale: 8.0 2023-10-09 12:45:56,491 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2760949.3333333335, ans=0.0 2023-10-09 12:45:56,506 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2760949.3333333335, ans=0.1 2023-10-09 12:46:12,174 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2760996.0, ans=0.0 2023-10-09 12:46:29,224 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2761042.6666666665, ans=0.125 2023-10-09 12:46:38,591 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2761089.3333333335, ans=0.125 2023-10-09 12:46:41,695 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2761089.3333333335, ans=0.125 2023-10-09 12:46:55,997 INFO [train.py:1031] (2/4) Epoch 14, batch 6950, loss[loss=0.2251, simple_loss=0.2996, pruned_loss=0.05515, ctc_loss=0.1008, over 16832.00 frames. ], tot_loss[loss=0.2453, simple_loss=0.3048, pruned_loss=0.0686, ctc_loss=0.1215, over 3317204.81 frames. ], batch size: 242, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:47:05,965 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2023-10-09 12:47:26,075 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2761276.0, ans=0.125 2023-10-09 12:47:28,284 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.28 vs. limit=15.0 2023-10-09 12:47:34,797 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.443e+02 3.216e+02 3.562e+02 4.288e+02 5.901e+02, threshold=7.125e+02, percent-clipped=0.0 2023-10-09 12:47:53,019 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2761369.3333333335, ans=0.125 2023-10-09 12:47:55,787 INFO [train.py:1031] (2/4) Epoch 14, batch 7000, loss[loss=0.2576, simple_loss=0.2895, pruned_loss=0.08451, ctc_loss=0.1416, over 16793.00 frames. ], tot_loss[loss=0.2415, simple_loss=0.3, pruned_loss=0.06748, ctc_loss=0.1201, over 3319708.11 frames. ], batch size: 164, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:48:02,132 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2761416.0, ans=0.0 2023-10-09 12:48:17,206 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.35 vs. limit=15.0 2023-10-09 12:48:31,349 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2761556.0, ans=0.0 2023-10-09 12:48:31,446 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2761556.0, ans=0.07 2023-10-09 12:48:43,539 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.23 vs. limit=15.0 2023-10-09 12:48:46,000 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2761602.6666666665, ans=0.125 2023-10-09 12:48:53,321 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2761602.6666666665, ans=0.125 2023-10-09 12:48:56,167 INFO [train.py:1031] (2/4) Epoch 14, batch 7050, loss[loss=0.179, simple_loss=0.237, pruned_loss=0.04328, ctc_loss=0.08621, over 16225.00 frames. ], tot_loss[loss=0.2339, simple_loss=0.2917, pruned_loss=0.06492, ctc_loss=0.1159, over 3319146.48 frames. ], batch size: 463, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:49:38,090 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.145e+02 2.795e+02 3.132e+02 3.641e+02 6.976e+02, threshold=6.264e+02, percent-clipped=0.0 2023-10-09 12:49:42,153 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2761789.3333333335, ans=0.125 2023-10-09 12:49:48,290 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2761836.0, ans=0.0 2023-10-09 12:49:58,340 INFO [train.py:1031] (2/4) Epoch 14, batch 7100, loss[loss=0.1927, simple_loss=0.2528, pruned_loss=0.05017, ctc_loss=0.08045, over 16941.00 frames. ], tot_loss[loss=0.2261, simple_loss=0.2824, pruned_loss=0.0626, ctc_loss=0.1114, over 3317291.45 frames. ], batch size: 78, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:50:41,671 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2762022.6666666665, ans=0.0 2023-10-09 12:50:42,767 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2762022.6666666665, ans=0.125 2023-10-09 12:50:58,200 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=2762069.3333333335, ans=10.0 2023-10-09 12:51:00,099 INFO [train.py:1031] (2/4) Epoch 14, batch 7150, loss[loss=0.22, simple_loss=0.2633, pruned_loss=0.06489, ctc_loss=0.1176, over 16793.00 frames. ], tot_loss[loss=0.2219, simple_loss=0.2751, pruned_loss=0.06221, ctc_loss=0.1105, over 3312192.59 frames. ], batch size: 228, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:51:02,250 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2762116.0, ans=0.1 2023-10-09 12:51:38,503 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2762256.0, ans=0.04949747468305833 2023-10-09 12:51:42,360 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2762256.0, ans=0.2 2023-10-09 12:51:43,559 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.476e+02 3.162e+02 3.626e+02 4.175e+02 1.632e+03, threshold=7.251e+02, percent-clipped=2.0 2023-10-09 12:52:00,447 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=9.43 vs. limit=22.5 2023-10-09 12:52:00,875 INFO [train.py:1031] (2/4) Epoch 14, batch 7200, loss[loss=0.2425, simple_loss=0.2912, pruned_loss=0.07147, ctc_loss=0.1272, over 16906.00 frames. ], tot_loss[loss=0.2241, simple_loss=0.2755, pruned_loss=0.0638, ctc_loss=0.1126, over 3306280.93 frames. ], batch size: 309, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:52:02,161 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2762349.3333333335, ans=0.0 2023-10-09 12:52:02,280 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2762349.3333333335, ans=0.125 2023-10-09 12:52:16,087 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2762396.0, ans=0.0 2023-10-09 12:52:25,446 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.05 vs. limit=22.5 2023-10-09 12:52:27,291 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2762442.6666666665, ans=0.125 2023-10-09 12:52:40,823 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2762489.3333333335, ans=0.0 2023-10-09 12:52:42,289 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.78 vs. limit=15.0 2023-10-09 12:52:49,269 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2762536.0, ans=0.125 2023-10-09 12:53:03,086 INFO [train.py:1031] (2/4) Epoch 14, batch 7250, loss[loss=0.2081, simple_loss=0.2853, pruned_loss=0.04786, ctc_loss=0.08791, over 16890.00 frames. ], tot_loss[loss=0.2266, simple_loss=0.2781, pruned_loss=0.06476, ctc_loss=0.1137, over 3314773.98 frames. ], batch size: 215, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:53:27,933 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2762629.3333333335, ans=0.125 2023-10-09 12:53:35,170 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=2762676.0, ans=0.2 2023-10-09 12:53:36,101 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2762676.0, ans=0.125 2023-10-09 12:53:49,738 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.375e+02 3.091e+02 3.554e+02 4.025e+02 7.139e+02, threshold=7.107e+02, percent-clipped=0.0 2023-10-09 12:53:56,494 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2762769.3333333335, ans=0.125 2023-10-09 12:54:07,989 INFO [train.py:1031] (2/4) Epoch 14, batch 7300, loss[loss=0.2132, simple_loss=0.2497, pruned_loss=0.06564, ctc_loss=0.1133, over 16592.00 frames. ], tot_loss[loss=0.2252, simple_loss=0.2795, pruned_loss=0.06319, ctc_loss=0.1116, over 3314353.20 frames. ], batch size: 110, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 12:54:09,327 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 12:54:24,883 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.19 vs. limit=15.0 2023-10-09 12:54:40,004 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.90 vs. limit=15.0 2023-10-09 12:54:53,693 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2762956.0, ans=0.125 2023-10-09 12:55:07,498 INFO [train.py:1031] (2/4) Epoch 14, batch 7350, loss[loss=0.2082, simple_loss=0.266, pruned_loss=0.05645, ctc_loss=0.09355, over 16880.00 frames. ], tot_loss[loss=0.2252, simple_loss=0.2789, pruned_loss=0.0634, ctc_loss=0.1118, over 3320941.18 frames. ], batch size: 95, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:55:09,518 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2763049.3333333335, ans=0.2 2023-10-09 12:55:14,529 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2763049.3333333335, ans=0.035 2023-10-09 12:55:20,497 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2763096.0, ans=0.025 2023-10-09 12:55:26,921 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2763096.0, ans=0.2 2023-10-09 12:55:31,471 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.30 vs. limit=15.0 2023-10-09 12:55:50,541 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.493e+02 3.022e+02 3.585e+02 4.093e+02 1.089e+03, threshold=7.169e+02, percent-clipped=4.0 2023-10-09 12:55:53,862 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.55 vs. limit=22.5 2023-10-09 12:56:07,704 INFO [train.py:1031] (2/4) Epoch 14, batch 7400, loss[loss=0.23, simple_loss=0.282, pruned_loss=0.06606, ctc_loss=0.1147, over 16904.00 frames. ], tot_loss[loss=0.2264, simple_loss=0.2802, pruned_loss=0.06381, ctc_loss=0.1122, over 3323837.57 frames. ], batch size: 228, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 12:56:24,064 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2763329.3333333335, ans=0.0 2023-10-09 12:57:01,922 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2763469.3333333335, ans=0.2 2023-10-09 12:57:09,483 INFO [train.py:1031] (2/4) Epoch 14, batch 7450, loss[loss=0.1899, simple_loss=0.2595, pruned_loss=0.04454, ctc_loss=0.07814, over 16727.00 frames. ], tot_loss[loss=0.229, simple_loss=0.2859, pruned_loss=0.06362, ctc_loss=0.112, over 3315837.43 frames. ], batch size: 130, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 12:57:17,963 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2763516.0, ans=0.0 2023-10-09 12:57:19,913 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2763516.0, ans=0.125 2023-10-09 12:57:28,078 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2763562.6666666665, ans=0.125 2023-10-09 12:57:41,029 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2763609.3333333335, ans=0.125 2023-10-09 12:57:42,715 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2763609.3333333335, ans=0.125 2023-10-09 12:57:52,360 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2763656.0, ans=0.125 2023-10-09 12:57:58,529 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+02 3.056e+02 3.585e+02 4.525e+02 9.951e+02, threshold=7.170e+02, percent-clipped=3.0 2023-10-09 12:58:13,551 INFO [train.py:1031] (2/4) Epoch 14, batch 7500, loss[loss=0.2077, simple_loss=0.2979, pruned_loss=0.04172, ctc_loss=0.08502, over 16822.00 frames. ], tot_loss[loss=0.2246, simple_loss=0.2856, pruned_loss=0.06038, ctc_loss=0.1069, over 3309742.76 frames. ], batch size: 291, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:58:27,327 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2763796.0, ans=0.125 2023-10-09 12:58:29,989 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2763796.0, ans=0.0 2023-10-09 12:58:36,838 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2763842.6666666665, ans=0.025 2023-10-09 12:58:51,760 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2763889.3333333335, ans=0.05 2023-10-09 12:59:05,341 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2763936.0, ans=0.1 2023-10-09 12:59:13,832 INFO [train.py:1031] (2/4) Epoch 14, batch 7550, loss[loss=0.2238, simple_loss=0.285, pruned_loss=0.06159, ctc_loss=0.09876, over 16895.00 frames. ], tot_loss[loss=0.2208, simple_loss=0.2844, pruned_loss=0.05807, ctc_loss=0.1028, over 3308120.49 frames. ], batch size: 82, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 12:59:31,548 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2764029.3333333335, ans=0.125 2023-10-09 12:59:37,355 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.71 vs. limit=15.0 2023-10-09 12:59:48,357 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2764076.0, ans=0.125 2023-10-09 12:59:56,487 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2764122.6666666665, ans=0.1 2023-10-09 12:59:59,784 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.575e+02 3.477e+02 4.054e+02 5.168e+02 9.952e+02, threshold=8.108e+02, percent-clipped=5.0 2023-10-09 13:00:09,147 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2764169.3333333335, ans=0.2 2023-10-09 13:00:14,751 INFO [train.py:1031] (2/4) Epoch 14, batch 7600, loss[loss=0.2096, simple_loss=0.266, pruned_loss=0.05822, ctc_loss=0.0921, over 16792.00 frames. ], tot_loss[loss=0.2222, simple_loss=0.2827, pruned_loss=0.05977, ctc_loss=0.1055, over 3298282.20 frames. ], batch size: 102, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:00:48,660 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=2764309.3333333335, ans=0.025 2023-10-09 13:01:16,410 INFO [train.py:1031] (2/4) Epoch 14, batch 7650, loss[loss=0.2159, simple_loss=0.2659, pruned_loss=0.06232, ctc_loss=0.1029, over 16728.00 frames. ], tot_loss[loss=0.224, simple_loss=0.2817, pruned_loss=0.06156, ctc_loss=0.108, over 3297921.23 frames. ], batch size: 151, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:01:20,844 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2764449.3333333335, ans=0.0 2023-10-09 13:01:28,297 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2764496.0, ans=0.125 2023-10-09 13:01:37,035 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:01:40,949 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:01:48,914 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.16 vs. limit=15.0 2023-10-09 13:01:49,626 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2764542.6666666665, ans=0.125 2023-10-09 13:02:03,962 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.566e+02 3.219e+02 3.717e+02 4.421e+02 1.818e+03, threshold=7.434e+02, percent-clipped=3.0 2023-10-09 13:02:07,481 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2764636.0, ans=0.0 2023-10-09 13:02:16,491 INFO [train.py:1031] (2/4) Epoch 14, batch 7700, loss[loss=0.2127, simple_loss=0.2664, pruned_loss=0.05827, ctc_loss=0.106, over 16382.00 frames. ], tot_loss[loss=0.2239, simple_loss=0.2807, pruned_loss=0.06183, ctc_loss=0.1086, over 3304563.75 frames. ], batch size: 463, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:02:36,995 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.87 vs. limit=15.0 2023-10-09 13:02:55,589 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2764822.6666666665, ans=0.2 2023-10-09 13:03:09,765 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.95 vs. limit=15.0 2023-10-09 13:03:10,260 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2764869.3333333335, ans=0.125 2023-10-09 13:03:17,588 INFO [train.py:1031] (2/4) Epoch 14, batch 7750, loss[loss=0.2274, simple_loss=0.2667, pruned_loss=0.07033, ctc_loss=0.1184, over 16784.00 frames. ], tot_loss[loss=0.2237, simple_loss=0.2788, pruned_loss=0.0624, ctc_loss=0.1096, over 3313222.47 frames. ], batch size: 188, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:03:17,909 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2764916.0, ans=0.1 2023-10-09 13:03:17,917 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2764916.0, ans=0.125 2023-10-09 13:03:19,389 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2764916.0, ans=0.1 2023-10-09 13:03:28,001 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2764916.0, ans=0.0 2023-10-09 13:03:46,733 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2765009.3333333335, ans=0.125 2023-10-09 13:03:55,367 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2765056.0, ans=0.125 2023-10-09 13:04:08,219 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.438e+02 3.190e+02 3.453e+02 4.126e+02 8.582e+02, threshold=6.905e+02, percent-clipped=1.0 2023-10-09 13:04:10,845 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.04 vs. limit=15.0 2023-10-09 13:04:14,203 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2765102.6666666665, ans=0.04949747468305833 2023-10-09 13:04:16,505 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2765102.6666666665, ans=0.125 2023-10-09 13:04:20,483 INFO [train.py:1031] (2/4) Epoch 14, batch 7800, loss[loss=0.2407, simple_loss=0.2967, pruned_loss=0.06955, ctc_loss=0.1139, over 16948.00 frames. ], tot_loss[loss=0.2281, simple_loss=0.2813, pruned_loss=0.06486, ctc_loss=0.1129, over 3301244.92 frames. ], batch size: 258, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:04:36,461 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=12.92 vs. limit=22.5 2023-10-09 13:04:41,762 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=9.20 vs. limit=22.5 2023-10-09 13:05:21,676 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2765336.0, ans=0.125 2023-10-09 13:05:23,384 INFO [train.py:1031] (2/4) Epoch 14, batch 7850, loss[loss=0.1754, simple_loss=0.2042, pruned_loss=0.05655, ctc_loss=0.08372, over 10649.00 frames. ], tot_loss[loss=0.2304, simple_loss=0.2847, pruned_loss=0.06549, ctc_loss=0.1127, over 3290982.07 frames. ], batch size: 37, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:05:27,524 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2765382.6666666665, ans=0.125 2023-10-09 13:05:27,578 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2765382.6666666665, ans=0.0 2023-10-09 13:05:45,125 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2765429.3333333335, ans=0.125 2023-10-09 13:05:48,757 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=2765476.0, ans=0.05 2023-10-09 13:06:14,908 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.327e+02 3.051e+02 3.784e+02 4.491e+02 1.708e+03, threshold=7.568e+02, percent-clipped=4.0 2023-10-09 13:06:15,306 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2765569.3333333335, ans=0.0 2023-10-09 13:06:24,964 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2765616.0, ans=0.125 2023-10-09 13:06:26,259 INFO [train.py:1031] (2/4) Epoch 14, batch 7900, loss[loss=0.2202, simple_loss=0.2912, pruned_loss=0.05437, ctc_loss=0.1013, over 16867.00 frames. ], tot_loss[loss=0.2296, simple_loss=0.2887, pruned_loss=0.06325, ctc_loss=0.1099, over 3289783.35 frames. ], batch size: 228, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:06:32,642 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.66 vs. limit=15.0 2023-10-09 13:07:17,147 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.57 vs. limit=15.0 2023-10-09 13:07:27,533 INFO [train.py:1031] (2/4) Epoch 14, batch 7950, loss[loss=0.2232, simple_loss=0.2803, pruned_loss=0.06216, ctc_loss=0.1045, over 16920.00 frames. ], tot_loss[loss=0.2293, simple_loss=0.289, pruned_loss=0.06286, ctc_loss=0.1095, over 3301710.92 frames. ], batch size: 228, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:07:27,916 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2765849.3333333335, ans=0.125 2023-10-09 13:07:59,745 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2765942.6666666665, ans=0.09899494936611666 2023-10-09 13:08:19,020 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.253e+02 2.992e+02 3.305e+02 3.981e+02 8.015e+02, threshold=6.609e+02, percent-clipped=1.0 2023-10-09 13:08:28,685 INFO [train.py:1031] (2/4) Epoch 14, batch 8000, loss[loss=0.2198, simple_loss=0.2755, pruned_loss=0.06152, ctc_loss=0.1026, over 16950.00 frames. ], tot_loss[loss=0.2296, simple_loss=0.2869, pruned_loss=0.06393, ctc_loss=0.1113, over 3312598.68 frames. ], batch size: 228, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:08:43,704 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2766129.3333333335, ans=0.125 2023-10-09 13:08:45,731 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2766129.3333333335, ans=0.2 2023-10-09 13:08:45,810 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2766129.3333333335, ans=0.2 2023-10-09 13:09:29,361 INFO [train.py:1031] (2/4) Epoch 14, batch 8050, loss[loss=0.2225, simple_loss=0.2766, pruned_loss=0.06195, ctc_loss=0.1111, over 16702.00 frames. ], tot_loss[loss=0.2289, simple_loss=0.2846, pruned_loss=0.06428, ctc_loss=0.1117, over 3313096.28 frames. ], batch size: 102, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:10:08,538 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=17.85 vs. limit=15.0 2023-10-09 13:10:22,705 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.320e+02 3.190e+02 3.616e+02 4.250e+02 6.056e+02, threshold=7.233e+02, percent-clipped=0.0 2023-10-09 13:10:24,387 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.07 vs. limit=22.5 2023-10-09 13:10:30,090 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2766549.3333333335, ans=0.125 2023-10-09 13:10:30,822 INFO [train.py:1031] (2/4) Epoch 14, batch 8100, loss[loss=0.235, simple_loss=0.2734, pruned_loss=0.07331, ctc_loss=0.1253, over 16746.00 frames. ], tot_loss[loss=0.2285, simple_loss=0.2824, pruned_loss=0.06482, ctc_loss=0.1126, over 3303223.75 frames. ], batch size: 130, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:10:56,058 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2766642.6666666665, ans=0.2 2023-10-09 13:11:31,971 INFO [train.py:1031] (2/4) Epoch 14, batch 8150, loss[loss=0.2363, simple_loss=0.2709, pruned_loss=0.07501, ctc_loss=0.1291, over 16623.00 frames. ], tot_loss[loss=0.2274, simple_loss=0.2801, pruned_loss=0.06478, ctc_loss=0.1128, over 3310257.34 frames. ], batch size: 416, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:11:40,072 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2766782.6666666665, ans=0.0 2023-10-09 13:11:42,572 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.64 vs. limit=15.0 2023-10-09 13:12:04,821 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2766876.0, ans=0.1 2023-10-09 13:12:26,044 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.215e+02 3.114e+02 3.645e+02 4.244e+02 8.238e+02, threshold=7.291e+02, percent-clipped=3.0 2023-10-09 13:12:26,392 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2766969.3333333335, ans=0.0 2023-10-09 13:12:31,793 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2766969.3333333335, ans=0.5 2023-10-09 13:12:33,455 INFO [train.py:1031] (2/4) Epoch 14, batch 8200, loss[loss=0.2243, simple_loss=0.2947, pruned_loss=0.05676, ctc_loss=0.1008, over 16790.00 frames. ], tot_loss[loss=0.2256, simple_loss=0.2814, pruned_loss=0.06285, ctc_loss=0.1103, over 3313652.68 frames. ], batch size: 272, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:12:45,091 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:13:01,450 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2767109.3333333335, ans=0.2 2023-10-09 13:13:37,129 INFO [train.py:1031] (2/4) Epoch 14, batch 8250, loss[loss=0.2139, simple_loss=0.3105, pruned_loss=0.04127, ctc_loss=0.08656, over 16913.00 frames. ], tot_loss[loss=0.2285, simple_loss=0.2899, pruned_loss=0.06166, ctc_loss=0.1092, over 3318734.55 frames. ], batch size: 292, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:13:37,529 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2767249.3333333335, ans=0.125 2023-10-09 13:13:40,754 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2767249.3333333335, ans=0.125 2023-10-09 13:14:01,259 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2767342.6666666665, ans=0.125 2023-10-09 13:14:32,770 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.983e+02 2.653e+02 3.032e+02 3.705e+02 6.938e+02, threshold=6.064e+02, percent-clipped=0.0 2023-10-09 13:14:40,099 INFO [train.py:1031] (2/4) Epoch 14, batch 8300, loss[loss=0.1664, simple_loss=0.2271, pruned_loss=0.03914, ctc_loss=0.06819, over 16747.00 frames. ], tot_loss[loss=0.2232, simple_loss=0.2894, pruned_loss=0.05791, ctc_loss=0.1032, over 3321296.35 frames. ], batch size: 95, lr: 2.58e-03, grad_scale: 8.0 2023-10-09 13:14:49,850 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2767482.6666666665, ans=0.125 2023-10-09 13:14:51,929 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2767529.3333333335, ans=0.5 2023-10-09 13:15:08,138 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.08 vs. limit=15.0 2023-10-09 13:15:10,056 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2767576.0, ans=0.07 2023-10-09 13:15:11,784 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2767576.0, ans=0.125 2023-10-09 13:15:18,182 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.37 vs. limit=22.5 2023-10-09 13:15:24,116 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:15:42,537 INFO [train.py:1031] (2/4) Epoch 14, batch 8350, loss[loss=0.1551, simple_loss=0.2154, pruned_loss=0.03489, ctc_loss=0.06235, over 16948.00 frames. ], tot_loss[loss=0.2206, simple_loss=0.2885, pruned_loss=0.05624, ctc_loss=0.1003, over 3316029.40 frames. ], batch size: 86, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:15:44,642 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2767716.0, ans=0.0 2023-10-09 13:15:49,884 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2767716.0, ans=0.2 2023-10-09 13:15:56,537 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2767762.6666666665, ans=0.0 2023-10-09 13:16:15,184 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2767809.3333333335, ans=0.0 2023-10-09 13:16:24,357 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2767856.0, ans=0.125 2023-10-09 13:16:38,123 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.167e+02 2.846e+02 3.361e+02 4.142e+02 6.688e+02, threshold=6.722e+02, percent-clipped=2.0 2023-10-09 13:16:44,683 INFO [train.py:1031] (2/4) Epoch 14, batch 8400, loss[loss=0.2189, simple_loss=0.2859, pruned_loss=0.056, ctc_loss=0.09997, over 16895.00 frames. ], tot_loss[loss=0.2137, simple_loss=0.2834, pruned_loss=0.05297, ctc_loss=0.09507, over 3315028.23 frames. ], batch size: 202, lr: 2.58e-03, grad_scale: 8.0 2023-10-09 13:17:06,641 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.38 vs. limit=6.0 2023-10-09 13:17:12,913 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2768042.6666666665, ans=0.0 2023-10-09 13:17:13,366 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.75 vs. limit=22.5 2023-10-09 13:17:39,589 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.18 vs. limit=22.5 2023-10-09 13:17:48,487 INFO [train.py:1031] (2/4) Epoch 14, batch 8450, loss[loss=0.2469, simple_loss=0.3177, pruned_loss=0.06472, ctc_loss=0.1165, over 16830.00 frames. ], tot_loss[loss=0.2181, simple_loss=0.2885, pruned_loss=0.0543, ctc_loss=0.0977, over 3311581.92 frames. ], batch size: 202, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:17:50,113 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.40 vs. limit=15.0 2023-10-09 13:17:57,832 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2768182.6666666665, ans=0.125 2023-10-09 13:17:57,834 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2768182.6666666665, ans=0.2 2023-10-09 13:18:45,222 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+02 3.438e+02 4.157e+02 5.474e+02 9.633e+02, threshold=8.314e+02, percent-clipped=10.0 2023-10-09 13:18:48,296 INFO [train.py:1031] (2/4) Epoch 14, batch 8500, loss[loss=0.3074, simple_loss=0.327, pruned_loss=0.1048, ctc_loss=0.1956, over 16802.00 frames. ], tot_loss[loss=0.224, simple_loss=0.2915, pruned_loss=0.05757, ctc_loss=0.1033, over 3309542.30 frames. ], batch size: 384, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:18:53,339 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2768416.0, ans=0.125 2023-10-09 13:19:27,535 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2768556.0, ans=0.5 2023-10-09 13:19:49,005 INFO [train.py:1031] (2/4) Epoch 14, batch 8550, loss[loss=0.2452, simple_loss=0.2988, pruned_loss=0.07161, ctc_loss=0.1206, over 16954.00 frames. ], tot_loss[loss=0.2254, simple_loss=0.2893, pruned_loss=0.05954, ctc_loss=0.1063, over 3318383.92 frames. ], batch size: 86, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:19:54,205 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.31 vs. limit=15.0 2023-10-09 13:20:26,141 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2768742.6666666665, ans=0.2 2023-10-09 13:20:47,544 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:20:51,138 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.446e+02 3.120e+02 3.650e+02 4.330e+02 6.615e+02, threshold=7.300e+02, percent-clipped=0.0 2023-10-09 13:20:52,617 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2768882.6666666665, ans=0.0 2023-10-09 13:20:53,245 INFO [train.py:1031] (2/4) Epoch 14, batch 8600, loss[loss=0.1958, simple_loss=0.2508, pruned_loss=0.05271, ctc_loss=0.08815, over 16699.00 frames. ], tot_loss[loss=0.2242, simple_loss=0.2868, pruned_loss=0.05956, ctc_loss=0.1062, over 3299246.50 frames. ], batch size: 151, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:21:04,721 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:21:21,105 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2768976.0, ans=0.125 2023-10-09 13:21:31,982 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2769022.6666666665, ans=0.125 2023-10-09 13:21:48,916 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2769069.3333333335, ans=0.05 2023-10-09 13:21:55,018 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.07 vs. limit=12.0 2023-10-09 13:21:56,024 INFO [train.py:1031] (2/4) Epoch 14, batch 8650, loss[loss=0.197, simple_loss=0.273, pruned_loss=0.04387, ctc_loss=0.08332, over 16871.00 frames. ], tot_loss[loss=0.217, simple_loss=0.2816, pruned_loss=0.05614, ctc_loss=0.1001, over 3291501.56 frames. ], batch size: 202, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:22:05,117 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2769116.0, ans=0.0 2023-10-09 13:22:21,834 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.45 vs. limit=15.0 2023-10-09 13:22:48,575 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2769302.6666666665, ans=0.0 2023-10-09 13:22:59,313 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.344e+02 2.781e+02 3.323e+02 4.118e+02 1.274e+03, threshold=6.646e+02, percent-clipped=1.0 2023-10-09 13:23:00,362 INFO [train.py:1031] (2/4) Epoch 14, batch 8700, loss[loss=0.2005, simple_loss=0.2715, pruned_loss=0.04789, ctc_loss=0.08434, over 16785.00 frames. ], tot_loss[loss=0.214, simple_loss=0.2805, pruned_loss=0.05429, ctc_loss=0.09726, over 3288227.67 frames. ], batch size: 176, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:23:11,523 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2769396.0, ans=0.1 2023-10-09 13:23:35,262 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2769489.3333333335, ans=0.125 2023-10-09 13:23:40,595 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2769489.3333333335, ans=0.125 2023-10-09 13:23:44,521 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2769489.3333333335, ans=0.2 2023-10-09 13:23:53,462 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.89 vs. limit=22.5 2023-10-09 13:23:58,880 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2769536.0, ans=0.5 2023-10-09 13:24:00,648 INFO [train.py:1031] (2/4) Epoch 14, batch 8750, loss[loss=0.1594, simple_loss=0.2299, pruned_loss=0.03298, ctc_loss=0.05762, over 16681.00 frames. ], tot_loss[loss=0.2148, simple_loss=0.2829, pruned_loss=0.05386, ctc_loss=0.09721, over 3298821.50 frames. ], batch size: 102, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:24:05,989 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2769582.6666666665, ans=0.0 2023-10-09 13:24:26,587 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2769676.0, ans=0.2 2023-10-09 13:24:27,627 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2769676.0, ans=0.04949747468305833 2023-10-09 13:24:28,732 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2769676.0, ans=0.2 2023-10-09 13:24:32,775 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.13 vs. limit=22.5 2023-10-09 13:24:36,575 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.53 vs. limit=10.0 2023-10-09 13:24:41,644 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2769722.6666666665, ans=0.05 2023-10-09 13:24:54,252 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.27 vs. limit=22.5 2023-10-09 13:24:55,505 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.08 vs. limit=15.0 2023-10-09 13:25:02,688 INFO [train.py:1031] (2/4) Epoch 14, batch 8800, loss[loss=0.1619, simple_loss=0.2485, pruned_loss=0.02736, ctc_loss=0.05139, over 16793.00 frames. ], tot_loss[loss=0.2102, simple_loss=0.2816, pruned_loss=0.05094, ctc_loss=0.09232, over 3293849.63 frames. ], batch size: 164, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:25:03,717 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.922e+02 2.673e+02 3.234e+02 4.604e+02 9.306e+02, threshold=6.469e+02, percent-clipped=8.0 2023-10-09 13:25:08,125 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.59 vs. limit=12.0 2023-10-09 13:25:12,621 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2769816.0, ans=0.125 2023-10-09 13:25:48,409 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2769956.0, ans=0.0 2023-10-09 13:25:50,543 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2769956.0, ans=0.0 2023-10-09 13:25:56,031 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2770002.6666666665, ans=0.125 2023-10-09 13:26:05,140 INFO [train.py:1031] (2/4) Epoch 14, batch 8850, loss[loss=0.173, simple_loss=0.2577, pruned_loss=0.0316, ctc_loss=0.06295, over 16878.00 frames. ], tot_loss[loss=0.2023, simple_loss=0.2772, pruned_loss=0.04671, ctc_loss=0.08499, over 3301572.87 frames. ], batch size: 202, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:26:14,956 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.15 vs. limit=15.0 2023-10-09 13:26:35,570 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2770142.6666666665, ans=0.0 2023-10-09 13:26:37,252 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2770142.6666666665, ans=0.1 2023-10-09 13:26:44,427 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2770189.3333333335, ans=0.0 2023-10-09 13:27:05,707 INFO [train.py:1031] (2/4) Epoch 14, batch 8900, loss[loss=0.2557, simple_loss=0.2902, pruned_loss=0.08208, ctc_loss=0.1424, over 16604.00 frames. ], tot_loss[loss=0.2009, simple_loss=0.2752, pruned_loss=0.04654, ctc_loss=0.08406, over 3299634.03 frames. ], batch size: 416, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:27:08,454 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+02 2.336e+02 2.693e+02 3.509e+02 6.659e+02, threshold=5.387e+02, percent-clipped=1.0 2023-10-09 13:27:27,487 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2770329.3333333335, ans=0.2 2023-10-09 13:27:50,268 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2770422.6666666665, ans=0.0 2023-10-09 13:27:50,278 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2770422.6666666665, ans=0.1 2023-10-09 13:27:52,406 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2770422.6666666665, ans=0.04949747468305833 2023-10-09 13:27:53,321 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2770422.6666666665, ans=0.125 2023-10-09 13:28:08,425 INFO [train.py:1031] (2/4) Epoch 14, batch 8950, loss[loss=0.2126, simple_loss=0.2667, pruned_loss=0.05954, ctc_loss=0.09876, over 11873.00 frames. ], tot_loss[loss=0.2052, simple_loss=0.2741, pruned_loss=0.0502, ctc_loss=0.08977, over 3298261.32 frames. ], batch size: 35, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:28:24,535 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2770562.6666666665, ans=0.125 2023-10-09 13:28:48,268 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2770656.0, ans=0.0 2023-10-09 13:28:50,869 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2770656.0, ans=0.125 2023-10-09 13:28:50,958 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2770656.0, ans=0.125 2023-10-09 13:28:55,785 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2770656.0, ans=0.125 2023-10-09 13:28:56,896 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2770702.6666666665, ans=0.05 2023-10-09 13:29:10,854 INFO [train.py:1031] (2/4) Epoch 14, batch 9000, loss[loss=0.2284, simple_loss=0.2704, pruned_loss=0.06845, ctc_loss=0.1238, over 16738.00 frames. ], tot_loss[loss=0.2078, simple_loss=0.2712, pruned_loss=0.0533, ctc_loss=0.09453, over 3299225.33 frames. ], batch size: 292, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:29:10,855 INFO [train.py:1054] (2/4) Computing validation loss 2023-10-09 13:29:26,446 INFO [train.py:1063] (2/4) Epoch 14, validation: loss=0.2412, simple_loss=0.3097, pruned_loss=0.06635, ctc_loss=0.1001, over 1796401.00 frames. 2023-10-09 13:29:26,447 INFO [train.py:1064] (2/4) Maximum memory allocated so far is 14591MB 2023-10-09 13:29:29,126 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.435e+02 3.467e+02 3.875e+02 4.625e+02 8.873e+02, threshold=7.750e+02, percent-clipped=12.0 2023-10-09 13:29:46,817 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2770796.0, ans=0.125 2023-10-09 13:29:48,845 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2770796.0, ans=0.2 2023-10-09 13:29:50,289 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.70 vs. limit=15.0 2023-10-09 13:30:28,023 INFO [train.py:1031] (2/4) Epoch 14, batch 9050, loss[loss=0.1956, simple_loss=0.2105, pruned_loss=0.06596, ctc_loss=0.1219, over 15354.00 frames. ], tot_loss[loss=0.2076, simple_loss=0.2672, pruned_loss=0.05468, ctc_loss=0.09667, over 3298161.43 frames. ], batch size: 529, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:30:36,935 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2770982.6666666665, ans=0.05 2023-10-09 13:30:54,831 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=2771076.0, ans=0.1 2023-10-09 13:31:11,842 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2771122.6666666665, ans=0.07 2023-10-09 13:31:27,928 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2771216.0, ans=0.125 2023-10-09 13:31:29,168 INFO [train.py:1031] (2/4) Epoch 14, batch 9100, loss[loss=0.1915, simple_loss=0.244, pruned_loss=0.05054, ctc_loss=0.09499, over 15184.00 frames. ], tot_loss[loss=0.2073, simple_loss=0.264, pruned_loss=0.05568, ctc_loss=0.09792, over 3302663.75 frames. ], batch size: 529, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:31:34,400 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.132e+02 2.919e+02 3.286e+02 3.918e+02 6.845e+02, threshold=6.573e+02, percent-clipped=0.0 2023-10-09 13:31:37,339 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.59 vs. limit=15.0 2023-10-09 13:31:54,774 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2771309.3333333335, ans=0.2 2023-10-09 13:31:56,612 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2771309.3333333335, ans=0.0 2023-10-09 13:32:12,128 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.68 vs. limit=15.0 2023-10-09 13:32:12,695 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2771356.0, ans=0.1 2023-10-09 13:32:30,943 INFO [train.py:1031] (2/4) Epoch 14, batch 9150, loss[loss=0.2691, simple_loss=0.3163, pruned_loss=0.08159, ctc_loss=0.147, over 16819.00 frames. ], tot_loss[loss=0.2063, simple_loss=0.266, pruned_loss=0.05405, ctc_loss=0.0961, over 3302483.96 frames. ], batch size: 384, lr: 2.58e-03, grad_scale: 1.0 2023-10-09 13:32:31,267 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2771449.3333333335, ans=0.1 2023-10-09 13:32:36,674 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2771449.3333333335, ans=0.05 2023-10-09 13:32:43,650 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2771496.0, ans=0.125 2023-10-09 13:32:44,797 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2771496.0, ans=0.125 2023-10-09 13:33:26,333 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2771636.0, ans=0.2 2023-10-09 13:33:30,898 INFO [train.py:1031] (2/4) Epoch 14, batch 9200, loss[loss=0.2089, simple_loss=0.24, pruned_loss=0.06482, ctc_loss=0.1204, over 15453.00 frames. ], tot_loss[loss=0.2113, simple_loss=0.2697, pruned_loss=0.05646, ctc_loss=0.1001, over 3302254.53 frames. ], batch size: 526, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:33:36,548 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2771682.6666666665, ans=0.125 2023-10-09 13:33:37,287 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.104e+02 2.803e+02 3.406e+02 4.271e+02 8.869e+02, threshold=6.811e+02, percent-clipped=4.0 2023-10-09 13:33:45,458 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2771729.3333333335, ans=0.1 2023-10-09 13:33:59,397 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2771776.0, ans=0.0 2023-10-09 13:34:02,638 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.20 vs. limit=22.5 2023-10-09 13:34:20,151 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2771869.3333333335, ans=0.0 2023-10-09 13:34:32,045 INFO [train.py:1031] (2/4) Epoch 14, batch 9250, loss[loss=0.2343, simple_loss=0.2841, pruned_loss=0.06804, ctc_loss=0.1212, over 16417.00 frames. ], tot_loss[loss=0.2137, simple_loss=0.2704, pruned_loss=0.05803, ctc_loss=0.1022, over 3302888.06 frames. ], batch size: 416, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:34:36,074 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2771916.0, ans=0.2 2023-10-09 13:34:53,145 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2771962.6666666665, ans=0.0 2023-10-09 13:34:53,169 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2771962.6666666665, ans=0.1 2023-10-09 13:34:55,142 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:35:33,983 INFO [train.py:1031] (2/4) Epoch 14, batch 9300, loss[loss=0.2516, simple_loss=0.335, pruned_loss=0.06082, ctc_loss=0.1166, over 16845.00 frames. ], tot_loss[loss=0.213, simple_loss=0.2715, pruned_loss=0.05704, ctc_loss=0.101, over 3293777.41 frames. ], batch size: 215, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:35:38,098 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2772149.3333333335, ans=0.125 2023-10-09 13:35:41,376 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.253e+02 2.952e+02 3.315e+02 3.911e+02 8.519e+02, threshold=6.629e+02, percent-clipped=4.0 2023-10-09 13:35:41,809 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2772149.3333333335, ans=0.0 2023-10-09 13:35:44,152 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.15 vs. limit=10.0 2023-10-09 13:36:03,432 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2772242.6666666665, ans=0.0 2023-10-09 13:36:21,558 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2772289.3333333335, ans=0.125 2023-10-09 13:36:35,774 INFO [train.py:1031] (2/4) Epoch 14, batch 9350, loss[loss=0.1983, simple_loss=0.2731, pruned_loss=0.04521, ctc_loss=0.08289, over 16966.00 frames. ], tot_loss[loss=0.2184, simple_loss=0.2776, pruned_loss=0.05874, ctc_loss=0.1041, over 3304812.67 frames. ], batch size: 216, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:36:42,970 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.34 vs. limit=22.5 2023-10-09 13:36:54,548 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.68 vs. limit=15.0 2023-10-09 13:37:00,003 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2772476.0, ans=0.125 2023-10-09 13:37:00,185 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.49 vs. limit=15.0 2023-10-09 13:37:04,390 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2772476.0, ans=0.125 2023-10-09 13:37:31,791 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2772569.3333333335, ans=0.0 2023-10-09 13:37:39,101 INFO [train.py:1031] (2/4) Epoch 14, batch 9400, loss[loss=0.2453, simple_loss=0.3557, pruned_loss=0.04832, ctc_loss=0.09563, over 15195.00 frames. ], tot_loss[loss=0.2215, simple_loss=0.2844, pruned_loss=0.05846, ctc_loss=0.1042, over 3295423.52 frames. ], batch size: 527, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:37:39,397 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2772616.0, ans=0.125 2023-10-09 13:37:46,565 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.188e+02 3.369e+02 4.289e+02 5.603e+02 1.054e+03, threshold=8.577e+02, percent-clipped=14.0 2023-10-09 13:37:54,638 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2772662.6666666665, ans=0.125 2023-10-09 13:38:00,554 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:38:03,247 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2772709.3333333335, ans=0.1 2023-10-09 13:38:10,087 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2772709.3333333335, ans=0.0 2023-10-09 13:38:14,983 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2772756.0, ans=0.125 2023-10-09 13:38:20,273 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2772756.0, ans=0.0 2023-10-09 13:38:41,001 INFO [train.py:1031] (2/4) Epoch 14, batch 9450, loss[loss=0.1901, simple_loss=0.2777, pruned_loss=0.03594, ctc_loss=0.0767, over 15295.00 frames. ], tot_loss[loss=0.2188, simple_loss=0.2867, pruned_loss=0.05549, ctc_loss=0.09962, over 3301457.99 frames. ], batch size: 526, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:39:00,112 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.40 vs. limit=15.0 2023-10-09 13:39:36,972 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2773036.0, ans=0.125 2023-10-09 13:39:40,289 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2773036.0, ans=0.0 2023-10-09 13:39:43,173 INFO [train.py:1031] (2/4) Epoch 14, batch 9500, loss[loss=0.2533, simple_loss=0.3093, pruned_loss=0.07584, ctc_loss=0.1137, over 16821.00 frames. ], tot_loss[loss=0.2223, simple_loss=0.2865, pruned_loss=0.05826, ctc_loss=0.1038, over 3309487.57 frames. ], batch size: 95, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:39:51,849 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.035e+02 3.046e+02 3.571e+02 4.127e+02 8.787e+02, threshold=7.141e+02, percent-clipped=1.0 2023-10-09 13:39:59,753 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2773129.3333333335, ans=0.0 2023-10-09 13:40:19,861 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=2773222.6666666665, ans=0.05 2023-10-09 13:40:21,342 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.16 vs. limit=10.0 2023-10-09 13:40:26,181 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2773222.6666666665, ans=0.0 2023-10-09 13:40:28,320 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2773222.6666666665, ans=0.125 2023-10-09 13:40:46,272 INFO [train.py:1031] (2/4) Epoch 14, batch 9550, loss[loss=0.2414, simple_loss=0.3006, pruned_loss=0.06725, ctc_loss=0.1191, over 16870.00 frames. ], tot_loss[loss=0.2302, simple_loss=0.2898, pruned_loss=0.06297, ctc_loss=0.1115, over 3307419.92 frames. ], batch size: 228, lr: 2.58e-03, grad_scale: 1.0 2023-10-09 13:41:03,877 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2773362.6666666665, ans=0.0 2023-10-09 13:41:09,332 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2773362.6666666665, ans=0.125 2023-10-09 13:41:11,255 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2773409.3333333335, ans=0.125 2023-10-09 13:41:14,249 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2773409.3333333335, ans=0.125 2023-10-09 13:41:33,032 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.24 vs. limit=12.0 2023-10-09 13:41:34,165 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.35 vs. limit=15.0 2023-10-09 13:41:34,720 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2773502.6666666665, ans=0.0 2023-10-09 13:41:36,436 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2773502.6666666665, ans=0.1 2023-10-09 13:41:39,691 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2773502.6666666665, ans=0.035 2023-10-09 13:41:48,548 INFO [train.py:1031] (2/4) Epoch 14, batch 9600, loss[loss=0.2666, simple_loss=0.3095, pruned_loss=0.08365, ctc_loss=0.1411, over 16747.00 frames. ], tot_loss[loss=0.2374, simple_loss=0.2943, pruned_loss=0.0667, ctc_loss=0.1178, over 3309666.10 frames. ], batch size: 111, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:41:56,039 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2773549.3333333335, ans=0.125 2023-10-09 13:42:00,632 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.675e+02 3.309e+02 3.670e+02 4.199e+02 1.268e+03, threshold=7.340e+02, percent-clipped=3.0 2023-10-09 13:42:05,840 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2773596.0, ans=0.125 2023-10-09 13:42:52,928 INFO [train.py:1031] (2/4) Epoch 14, batch 9650, loss[loss=0.244, simple_loss=0.3053, pruned_loss=0.0685, ctc_loss=0.1146, over 16808.00 frames. ], tot_loss[loss=0.2405, simple_loss=0.2969, pruned_loss=0.06803, ctc_loss=0.1199, over 3303583.78 frames. ], batch size: 121, lr: 2.58e-03, grad_scale: 1.0 2023-10-09 13:42:59,224 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2773782.6666666665, ans=0.125 2023-10-09 13:43:03,004 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2773782.6666666665, ans=0.125 2023-10-09 13:43:07,536 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2773829.3333333335, ans=0.125 2023-10-09 13:43:55,644 INFO [train.py:1031] (2/4) Epoch 14, batch 9700, loss[loss=0.2055, simple_loss=0.2691, pruned_loss=0.05249, ctc_loss=0.09257, over 16842.00 frames. ], tot_loss[loss=0.2354, simple_loss=0.2954, pruned_loss=0.06475, ctc_loss=0.1147, over 3308913.65 frames. ], batch size: 202, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:44:01,874 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2774016.0, ans=0.0 2023-10-09 13:44:03,940 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2774016.0, ans=0.2 2023-10-09 13:44:06,800 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.146e+02 2.889e+02 3.444e+02 4.302e+02 1.235e+03, threshold=6.889e+02, percent-clipped=2.0 2023-10-09 13:44:11,938 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2774062.6666666665, ans=0.2 2023-10-09 13:44:12,297 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.65 vs. limit=6.0 2023-10-09 13:44:33,998 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2774156.0, ans=0.125 2023-10-09 13:44:35,027 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2774156.0, ans=0.125 2023-10-09 13:44:54,308 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2774202.6666666665, ans=0.125 2023-10-09 13:44:56,794 INFO [train.py:1031] (2/4) Epoch 14, batch 9750, loss[loss=0.234, simple_loss=0.2518, pruned_loss=0.08066, ctc_loss=0.1373, over 16526.00 frames. ], tot_loss[loss=0.2297, simple_loss=0.2875, pruned_loss=0.06354, ctc_loss=0.1123, over 3310737.41 frames. ], batch size: 384, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:45:06,679 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2774249.3333333335, ans=0.95 2023-10-09 13:45:13,482 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2774296.0, ans=0.125 2023-10-09 13:45:15,812 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2774296.0, ans=0.1 2023-10-09 13:45:31,970 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2774342.6666666665, ans=0.2 2023-10-09 13:45:41,419 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.46 vs. limit=22.5 2023-10-09 13:45:42,753 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2774389.3333333335, ans=0.2 2023-10-09 13:45:45,624 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2774389.3333333335, ans=0.125 2023-10-09 13:45:59,207 INFO [train.py:1031] (2/4) Epoch 14, batch 9800, loss[loss=0.1962, simple_loss=0.315, pruned_loss=0.02791, ctc_loss=0.05374, over 16237.00 frames. ], tot_loss[loss=0.2257, simple_loss=0.2854, pruned_loss=0.06127, ctc_loss=0.1085, over 3308067.96 frames. ], batch size: 463, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:46:07,747 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2774482.6666666665, ans=0.125 2023-10-09 13:46:09,850 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2774482.6666666665, ans=0.125 2023-10-09 13:46:11,661 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.372e+02 3.062e+02 3.512e+02 4.119e+02 7.038e+02, threshold=7.024e+02, percent-clipped=1.0 2023-10-09 13:46:20,961 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2774529.3333333335, ans=0.125 2023-10-09 13:46:22,457 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.50 vs. limit=10.0 2023-10-09 13:46:53,759 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2774669.3333333335, ans=0.125 2023-10-09 13:46:59,684 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.14 vs. limit=15.0 2023-10-09 13:47:01,109 INFO [train.py:1031] (2/4) Epoch 14, batch 9850, loss[loss=0.2173, simple_loss=0.2821, pruned_loss=0.0553, ctc_loss=0.1048, over 16315.00 frames. ], tot_loss[loss=0.2265, simple_loss=0.2855, pruned_loss=0.06184, ctc_loss=0.1093, over 3308314.96 frames. ], batch size: 463, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:47:07,866 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2774716.0, ans=0.125 2023-10-09 13:47:38,689 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2774856.0, ans=0.0 2023-10-09 13:47:43,038 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2774856.0, ans=0.1 2023-10-09 13:47:49,857 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.85 vs. limit=15.0 2023-10-09 13:47:55,265 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.91 vs. limit=22.5 2023-10-09 13:47:58,136 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2774902.6666666665, ans=0.125 2023-10-09 13:48:02,735 INFO [train.py:1031] (2/4) Epoch 14, batch 9900, loss[loss=0.1732, simple_loss=0.2341, pruned_loss=0.04202, ctc_loss=0.07042, over 16826.00 frames. ], tot_loss[loss=0.222, simple_loss=0.2792, pruned_loss=0.0609, ctc_loss=0.1076, over 3298527.75 frames. ], batch size: 95, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:48:16,788 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.336e+02 2.878e+02 3.183e+02 3.713e+02 1.156e+03, threshold=6.367e+02, percent-clipped=1.0 2023-10-09 13:48:22,576 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=4.57 vs. limit=15.0 2023-10-09 13:48:23,686 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2774996.0, ans=0.0 2023-10-09 13:48:34,307 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2775042.6666666665, ans=0.125 2023-10-09 13:48:41,677 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2775089.3333333335, ans=0.2 2023-10-09 13:48:46,741 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2775089.3333333335, ans=0.125 2023-10-09 13:48:49,542 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2775089.3333333335, ans=0.125 2023-10-09 13:49:05,473 INFO [train.py:1031] (2/4) Epoch 14, batch 9950, loss[loss=0.1729, simple_loss=0.2283, pruned_loss=0.04301, ctc_loss=0.07859, over 16718.00 frames. ], tot_loss[loss=0.2176, simple_loss=0.2734, pruned_loss=0.05979, ctc_loss=0.1056, over 3301186.34 frames. ], batch size: 130, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:49:11,217 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2775182.6666666665, ans=0.0 2023-10-09 13:49:30,252 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.32 vs. limit=10.0 2023-10-09 13:49:36,400 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=2775276.0, ans=0.05 2023-10-09 13:49:38,207 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2775276.0, ans=0.0 2023-10-09 13:49:44,110 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=2775322.6666666665, ans=0.05 2023-10-09 13:49:55,503 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2775369.3333333335, ans=0.0 2023-10-09 13:50:08,677 INFO [train.py:1031] (2/4) Epoch 14, batch 10000, loss[loss=0.1923, simple_loss=0.2442, pruned_loss=0.05169, ctc_loss=0.09234, over 16819.00 frames. ], tot_loss[loss=0.2117, simple_loss=0.2687, pruned_loss=0.05718, ctc_loss=0.1009, over 3306818.11 frames. ], batch size: 176, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:50:11,664 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2775416.0, ans=0.2 2023-10-09 13:50:24,893 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.285e+02 2.835e+02 3.193e+02 3.667e+02 1.150e+03, threshold=6.386e+02, percent-clipped=3.0 2023-10-09 13:50:35,424 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2775509.3333333335, ans=0.2 2023-10-09 13:50:50,284 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.03 vs. limit=22.5 2023-10-09 13:50:53,618 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2775556.0, ans=0.125 2023-10-09 13:50:55,663 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2775556.0, ans=0.2 2023-10-09 13:51:03,003 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2775602.6666666665, ans=0.0 2023-10-09 13:51:04,011 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2775602.6666666665, ans=0.125 2023-10-09 13:51:06,649 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2775602.6666666665, ans=0.1 2023-10-09 13:51:10,632 INFO [train.py:1031] (2/4) Epoch 14, batch 10050, loss[loss=0.182, simple_loss=0.2406, pruned_loss=0.04613, ctc_loss=0.07776, over 16928.00 frames. ], tot_loss[loss=0.2116, simple_loss=0.2655, pruned_loss=0.05828, ctc_loss=0.1026, over 3291155.92 frames. ], batch size: 82, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:51:48,216 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2775789.3333333335, ans=0.2 2023-10-09 13:51:56,771 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2775789.3333333335, ans=0.125 2023-10-09 13:52:13,732 INFO [train.py:1031] (2/4) Epoch 14, batch 10100, loss[loss=0.1837, simple_loss=0.2408, pruned_loss=0.04788, ctc_loss=0.07739, over 16948.00 frames. ], tot_loss[loss=0.2101, simple_loss=0.2632, pruned_loss=0.05802, ctc_loss=0.1022, over 3288012.43 frames. ], batch size: 86, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:52:30,295 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.211e+02 2.838e+02 3.162e+02 3.584e+02 6.355e+02, threshold=6.323e+02, percent-clipped=0.0 2023-10-09 13:52:35,731 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2775929.3333333335, ans=0.125 2023-10-09 13:52:38,219 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2775976.0, ans=0.125 2023-10-09 13:52:38,245 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2775976.0, ans=0.0 2023-10-09 13:53:03,497 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.90 vs. limit=15.0 2023-10-09 13:53:10,119 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2776069.3333333335, ans=0.0 2023-10-09 13:53:12,916 INFO [train.py:1031] (2/4) Epoch 14, batch 10150, loss[loss=0.2517, simple_loss=0.296, pruned_loss=0.07612, ctc_loss=0.1379, over 16895.00 frames. ], tot_loss[loss=0.2127, simple_loss=0.2643, pruned_loss=0.05965, ctc_loss=0.1047, over 3286448.48 frames. ], batch size: 309, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:53:19,135 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2776116.0, ans=0.2 2023-10-09 13:53:24,814 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:53:31,514 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.73 vs. limit=12.0 2023-10-09 13:53:33,284 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2776162.6666666665, ans=0.125 2023-10-09 13:54:07,717 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2776302.6666666665, ans=0.125 2023-10-09 13:54:12,029 INFO [train.py:1031] (2/4) Epoch 14, batch 10200, loss[loss=0.2089, simple_loss=0.257, pruned_loss=0.06041, ctc_loss=0.09976, over 16956.00 frames. ], tot_loss[loss=0.2167, simple_loss=0.2672, pruned_loss=0.06154, ctc_loss=0.1078, over 3300782.10 frames. ], batch size: 78, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:54:28,916 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.387e+02 3.256e+02 3.649e+02 4.243e+02 9.669e+02, threshold=7.298e+02, percent-clipped=6.0 2023-10-09 13:54:36,111 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2776442.6666666665, ans=0.2 2023-10-09 13:54:45,435 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2776442.6666666665, ans=0.2 2023-10-09 13:54:56,096 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2776489.3333333335, ans=0.2 2023-10-09 13:55:00,563 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 13:55:06,138 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2776536.0, ans=0.0 2023-10-09 13:55:07,197 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2776536.0, ans=0.05 2023-10-09 13:55:11,010 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2776536.0, ans=0.125 2023-10-09 13:55:12,753 INFO [train.py:1031] (2/4) Epoch 14, batch 10250, loss[loss=0.2148, simple_loss=0.2683, pruned_loss=0.06002, ctc_loss=0.1032, over 16921.00 frames. ], tot_loss[loss=0.2153, simple_loss=0.2651, pruned_loss=0.06134, ctc_loss=0.1071, over 3305248.57 frames. ], batch size: 78, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:55:30,049 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2776629.3333333335, ans=0.0 2023-10-09 13:55:40,324 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2776676.0, ans=0.125 2023-10-09 13:55:49,041 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2776722.6666666665, ans=0.125 2023-10-09 13:56:09,156 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2776769.3333333335, ans=0.125 2023-10-09 13:56:14,048 INFO [train.py:1031] (2/4) Epoch 14, batch 10300, loss[loss=0.1989, simple_loss=0.2552, pruned_loss=0.05273, ctc_loss=0.09274, over 16865.00 frames. ], tot_loss[loss=0.2173, simple_loss=0.2652, pruned_loss=0.06274, ctc_loss=0.1096, over 3309975.53 frames. ], batch size: 189, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:56:15,383 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2776816.0, ans=0.1 2023-10-09 13:56:16,497 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2776816.0, ans=0.2 2023-10-09 13:56:18,828 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2776816.0, ans=0.0 2023-10-09 13:56:33,609 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.662e+02 3.333e+02 3.833e+02 4.530e+02 9.139e+02, threshold=7.666e+02, percent-clipped=3.0 2023-10-09 13:56:34,580 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2776862.6666666665, ans=0.2 2023-10-09 13:56:34,611 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2776862.6666666665, ans=0.125 2023-10-09 13:56:38,831 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2776909.3333333335, ans=0.2 2023-10-09 13:56:38,831 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2776909.3333333335, ans=0.09899494936611666 2023-10-09 13:56:39,880 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2776909.3333333335, ans=0.0 2023-10-09 13:56:41,404 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.75 vs. limit=10.0 2023-10-09 13:56:48,468 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=2776909.3333333335, ans=0.5 2023-10-09 13:56:48,798 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.28 vs. limit=15.0 2023-10-09 13:56:58,795 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2776956.0, ans=0.09899494936611666 2023-10-09 13:57:07,175 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=2777002.6666666665, ans=6.0 2023-10-09 13:57:16,373 INFO [train.py:1031] (2/4) Epoch 14, batch 10350, loss[loss=0.2067, simple_loss=0.2963, pruned_loss=0.04287, ctc_loss=0.07833, over 16886.00 frames. ], tot_loss[loss=0.2167, simple_loss=0.2666, pruned_loss=0.0618, ctc_loss=0.1081, over 3304892.26 frames. ], batch size: 309, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 13:57:28,982 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2777096.0, ans=0.125 2023-10-09 13:57:33,867 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2777096.0, ans=0.0 2023-10-09 13:57:34,387 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.48 vs. limit=15.0 2023-10-09 13:57:45,318 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.67 vs. limit=15.0 2023-10-09 13:57:51,510 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2777142.6666666665, ans=0.125 2023-10-09 13:57:51,770 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.34 vs. limit=6.0 2023-10-09 13:58:04,924 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2777236.0, ans=0.1 2023-10-09 13:58:17,996 INFO [train.py:1031] (2/4) Epoch 14, batch 10400, loss[loss=0.2546, simple_loss=0.3188, pruned_loss=0.06908, ctc_loss=0.1305, over 16583.00 frames. ], tot_loss[loss=0.2121, simple_loss=0.2692, pruned_loss=0.05729, ctc_loss=0.1012, over 3300801.46 frames. ], batch size: 350, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:58:22,325 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2777282.6666666665, ans=0.2 2023-10-09 13:58:31,788 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.15 vs. limit=12.0 2023-10-09 13:58:37,078 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.099e+02 2.958e+02 3.554e+02 4.330e+02 8.227e+02, threshold=7.107e+02, percent-clipped=1.0 2023-10-09 13:58:37,416 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2777329.3333333335, ans=0.1 2023-10-09 13:58:38,723 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.51 vs. limit=10.0 2023-10-09 13:58:39,554 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2777329.3333333335, ans=0.1 2023-10-09 13:59:02,971 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2777422.6666666665, ans=0.125 2023-10-09 13:59:20,076 INFO [train.py:1031] (2/4) Epoch 14, batch 10450, loss[loss=0.2524, simple_loss=0.301, pruned_loss=0.07492, ctc_loss=0.135, over 16908.00 frames. ], tot_loss[loss=0.218, simple_loss=0.2755, pruned_loss=0.05931, ctc_loss=0.1049, over 3291255.81 frames. ], batch size: 243, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 13:59:38,205 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2777562.6666666665, ans=0.2 2023-10-09 13:59:43,175 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2777562.6666666665, ans=0.09899494936611666 2023-10-09 13:59:44,567 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.63 vs. limit=15.0 2023-10-09 13:59:52,092 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2777609.3333333335, ans=0.09899494936611666 2023-10-09 14:00:20,054 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.52 vs. limit=15.0 2023-10-09 14:00:21,482 INFO [train.py:1031] (2/4) Epoch 14, batch 10500, loss[loss=0.2141, simple_loss=0.2607, pruned_loss=0.06174, ctc_loss=0.1099, over 16777.00 frames. ], tot_loss[loss=0.2216, simple_loss=0.2761, pruned_loss=0.06181, ctc_loss=0.1084, over 3290194.47 frames. ], batch size: 309, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 14:00:43,444 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.453e+02 3.496e+02 3.857e+02 4.755e+02 1.181e+03, threshold=7.715e+02, percent-clipped=1.0 2023-10-09 14:00:45,879 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2777842.6666666665, ans=0.05 2023-10-09 14:00:46,907 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=2777842.6666666665, ans=0.05 2023-10-09 14:01:04,212 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2777889.3333333335, ans=0.04949747468305833 2023-10-09 14:01:04,446 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.17 vs. limit=12.0 2023-10-09 14:01:07,863 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2777889.3333333335, ans=0.0 2023-10-09 14:01:22,068 INFO [train.py:1031] (2/4) Epoch 14, batch 10550, loss[loss=0.2, simple_loss=0.2469, pruned_loss=0.05713, ctc_loss=0.09722, over 16926.00 frames. ], tot_loss[loss=0.2189, simple_loss=0.2726, pruned_loss=0.06119, ctc_loss=0.107, over 3293075.02 frames. ], batch size: 78, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 14:01:28,745 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2777982.6666666665, ans=0.2 2023-10-09 14:01:54,746 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2778076.0, ans=0.0 2023-10-09 14:01:55,801 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2778076.0, ans=0.125 2023-10-09 14:02:02,500 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2778122.6666666665, ans=0.09899494936611666 2023-10-09 14:02:05,081 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2778122.6666666665, ans=0.0 2023-10-09 14:02:17,389 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2778169.3333333335, ans=0.125 2023-10-09 14:02:24,162 INFO [train.py:1031] (2/4) Epoch 14, batch 10600, loss[loss=0.248, simple_loss=0.31, pruned_loss=0.06892, ctc_loss=0.1205, over 16760.00 frames. ], tot_loss[loss=0.2187, simple_loss=0.2746, pruned_loss=0.06026, ctc_loss=0.1059, over 3300193.70 frames. ], batch size: 272, lr: 2.58e-03, grad_scale: 4.0 2023-10-09 14:02:47,793 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.313e+02 3.164e+02 3.650e+02 4.243e+02 8.211e+02, threshold=7.299e+02, percent-clipped=2.0 2023-10-09 14:03:24,432 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2778402.6666666665, ans=0.0 2023-10-09 14:03:26,249 INFO [train.py:1031] (2/4) Epoch 14, batch 10650, loss[loss=0.209, simple_loss=0.2629, pruned_loss=0.05845, ctc_loss=0.09541, over 16744.00 frames. ], tot_loss[loss=0.2236, simple_loss=0.2795, pruned_loss=0.06211, ctc_loss=0.1089, over 3297770.87 frames. ], batch size: 95, lr: 2.58e-03, grad_scale: 2.0 2023-10-09 14:03:32,799 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.24 vs. limit=15.0 2023-10-09 14:03:39,579 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2778496.0, ans=0.05 2023-10-09 14:03:58,576 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.53 vs. limit=15.0 2023-10-09 14:04:28,567 INFO [train.py:1031] (2/4) Epoch 14, batch 10700, loss[loss=0.1935, simple_loss=0.2624, pruned_loss=0.04565, ctc_loss=0.08326, over 16980.00 frames. ], tot_loss[loss=0.2172, simple_loss=0.2744, pruned_loss=0.05923, ctc_loss=0.1036, over 3282976.48 frames. ], batch size: 216, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:04:32,731 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2778682.6666666665, ans=0.125 2023-10-09 14:04:33,834 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2778682.6666666665, ans=0.125 2023-10-09 14:04:52,615 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.127e+02 3.058e+02 3.576e+02 4.175e+02 9.953e+02, threshold=7.153e+02, percent-clipped=1.0 2023-10-09 14:05:28,568 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2778869.3333333335, ans=0.2 2023-10-09 14:05:29,617 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2778869.3333333335, ans=0.0 2023-10-09 14:05:32,629 INFO [train.py:1031] (2/4) Epoch 14, batch 10750, loss[loss=0.2463, simple_loss=0.3022, pruned_loss=0.0708, ctc_loss=0.1219, over 16590.00 frames. ], tot_loss[loss=0.2235, simple_loss=0.2797, pruned_loss=0.06202, ctc_loss=0.1082, over 3282803.17 frames. ], batch size: 110, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:05:37,877 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2778916.0, ans=0.125 2023-10-09 14:05:46,562 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2778962.6666666665, ans=0.2 2023-10-09 14:06:00,762 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2779009.3333333335, ans=0.125 2023-10-09 14:06:07,401 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2779009.3333333335, ans=0.125 2023-10-09 14:06:09,812 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.76 vs. limit=15.0 2023-10-09 14:06:10,466 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2779056.0, ans=0.125 2023-10-09 14:06:17,074 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2779056.0, ans=0.125 2023-10-09 14:06:26,778 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2779102.6666666665, ans=0.0 2023-10-09 14:06:35,716 INFO [train.py:1031] (2/4) Epoch 14, batch 10800, loss[loss=0.2723, simple_loss=0.2903, pruned_loss=0.09387, ctc_loss=0.1667, over 16558.00 frames. ], tot_loss[loss=0.2266, simple_loss=0.2798, pruned_loss=0.06429, ctc_loss=0.1121, over 3284582.33 frames. ], batch size: 384, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:07:01,261 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.600e+02 3.349e+02 3.657e+02 4.515e+02 8.469e+02, threshold=7.313e+02, percent-clipped=4.0 2023-10-09 14:07:29,626 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2779336.0, ans=0.0 2023-10-09 14:07:36,231 INFO [train.py:1031] (2/4) Epoch 14, batch 10850, loss[loss=0.197, simple_loss=0.2563, pruned_loss=0.05234, ctc_loss=0.08253, over 16945.00 frames. ], tot_loss[loss=0.2226, simple_loss=0.2747, pruned_loss=0.06324, ctc_loss=0.1102, over 3292061.78 frames. ], batch size: 78, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:07:39,206 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2779382.6666666665, ans=0.0 2023-10-09 14:07:43,963 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2779382.6666666665, ans=0.04949747468305833 2023-10-09 14:07:48,765 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2779429.3333333335, ans=0.0 2023-10-09 14:08:08,556 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2779476.0, ans=0.07 2023-10-09 14:08:15,387 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.99 vs. limit=22.5 2023-10-09 14:08:28,596 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2779569.3333333335, ans=0.125 2023-10-09 14:08:31,171 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2779569.3333333335, ans=0.0 2023-10-09 14:08:38,606 INFO [train.py:1031] (2/4) Epoch 14, batch 10900, loss[loss=0.2082, simple_loss=0.2579, pruned_loss=0.05783, ctc_loss=0.1073, over 16797.00 frames. ], tot_loss[loss=0.2201, simple_loss=0.2703, pruned_loss=0.06299, ctc_loss=0.1101, over 3300487.51 frames. ], batch size: 273, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:08:43,825 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2779616.0, ans=0.2 2023-10-09 14:08:48,870 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.17 vs. limit=15.0 2023-10-09 14:09:05,442 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.235e+02 3.210e+02 3.855e+02 4.821e+02 1.226e+03, threshold=7.710e+02, percent-clipped=2.0 2023-10-09 14:09:16,221 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2779756.0, ans=0.1 2023-10-09 14:09:22,864 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=2779756.0, ans=0.5 2023-10-09 14:09:24,318 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.48 vs. limit=15.0 2023-10-09 14:09:39,564 INFO [train.py:1031] (2/4) Epoch 14, batch 10950, loss[loss=0.2083, simple_loss=0.2617, pruned_loss=0.05889, ctc_loss=0.09282, over 11641.00 frames. ], tot_loss[loss=0.2159, simple_loss=0.2651, pruned_loss=0.06174, ctc_loss=0.1082, over 3298477.32 frames. ], batch size: 35, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:09:45,291 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2779849.3333333335, ans=0.125 2023-10-09 14:09:47,593 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.54 vs. limit=15.0 2023-10-09 14:09:48,400 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2779849.3333333335, ans=0.0 2023-10-09 14:09:54,943 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2779896.0, ans=0.125 2023-10-09 14:10:02,526 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2779896.0, ans=0.2 2023-10-09 14:10:40,667 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2780036.0, ans=0.0 2023-10-09 14:10:42,343 INFO [train.py:1031] (2/4) Epoch 14, batch 11000, loss[loss=0.2819, simple_loss=0.301, pruned_loss=0.09689, ctc_loss=0.1727, over 16525.00 frames. ], tot_loss[loss=0.2166, simple_loss=0.2645, pruned_loss=0.06246, ctc_loss=0.1093, over 3296911.00 frames. ], batch size: 417, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:11:11,554 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.428e+02 3.328e+02 3.883e+02 5.018e+02 9.874e+02, threshold=7.766e+02, percent-clipped=3.0 2023-10-09 14:11:29,767 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.40 vs. limit=15.0 2023-10-09 14:11:41,084 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2780269.3333333335, ans=0.0 2023-10-09 14:11:46,299 INFO [train.py:1031] (2/4) Epoch 14, batch 11050, loss[loss=0.2285, simple_loss=0.2937, pruned_loss=0.05985, ctc_loss=0.1091, over 16840.00 frames. ], tot_loss[loss=0.2248, simple_loss=0.2724, pruned_loss=0.06569, ctc_loss=0.1147, over 3301745.38 frames. ], batch size: 242, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:12:23,538 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2780456.0, ans=0.125 2023-10-09 14:12:36,895 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2780502.6666666665, ans=0.125 2023-10-09 14:12:47,889 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2780502.6666666665, ans=0.0 2023-10-09 14:12:49,759 INFO [train.py:1031] (2/4) Epoch 14, batch 11100, loss[loss=0.1732, simple_loss=0.2422, pruned_loss=0.0387, ctc_loss=0.06684, over 16811.00 frames. ], tot_loss[loss=0.2237, simple_loss=0.2757, pruned_loss=0.06358, ctc_loss=0.1112, over 3295471.78 frames. ], batch size: 188, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:13:01,001 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.19 vs. limit=15.0 2023-10-09 14:13:07,028 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2780596.0, ans=0.0 2023-10-09 14:13:16,822 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2780642.6666666665, ans=0.0 2023-10-09 14:13:18,549 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.716e+02 3.611e+02 4.307e+02 5.885e+02 1.880e+03, threshold=8.614e+02, percent-clipped=7.0 2023-10-09 14:13:29,236 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2780689.3333333335, ans=0.2 2023-10-09 14:13:37,550 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 14:13:45,807 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2780736.0, ans=0.1 2023-10-09 14:13:51,663 INFO [train.py:1031] (2/4) Epoch 14, batch 11150, loss[loss=0.2138, simple_loss=0.256, pruned_loss=0.0633, ctc_loss=0.1125, over 16302.00 frames. ], tot_loss[loss=0.2212, simple_loss=0.2733, pruned_loss=0.0626, ctc_loss=0.1095, over 3291331.08 frames. ], batch size: 415, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:14:05,323 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.45 vs. limit=12.0 2023-10-09 14:14:10,666 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2780829.3333333335, ans=0.125 2023-10-09 14:14:14,485 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.54 vs. limit=15.0 2023-10-09 14:14:49,109 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2780969.3333333335, ans=0.2 2023-10-09 14:14:51,866 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.52 vs. limit=15.0 2023-10-09 14:14:53,090 INFO [train.py:1031] (2/4) Epoch 14, batch 11200, loss[loss=0.252, simple_loss=0.3216, pruned_loss=0.06735, ctc_loss=0.119, over 16772.00 frames. ], tot_loss[loss=0.2228, simple_loss=0.2738, pruned_loss=0.06374, ctc_loss=0.1111, over 3301780.70 frames. ], batch size: 272, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:15:14,017 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.87 vs. limit=15.0 2023-10-09 14:15:25,074 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.517e+02 3.168e+02 3.497e+02 4.095e+02 1.585e+03, threshold=6.993e+02, percent-clipped=3.0 2023-10-09 14:15:37,390 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2781156.0, ans=0.125 2023-10-09 14:15:55,280 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2781249.3333333335, ans=0.125 2023-10-09 14:15:55,968 INFO [train.py:1031] (2/4) Epoch 14, batch 11250, loss[loss=0.2701, simple_loss=0.3353, pruned_loss=0.07569, ctc_loss=0.134, over 16766.00 frames. ], tot_loss[loss=0.23, simple_loss=0.285, pruned_loss=0.06483, ctc_loss=0.1132, over 3301207.54 frames. ], batch size: 272, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:16:09,083 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2781296.0, ans=0.0 2023-10-09 14:16:16,363 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2781296.0, ans=0.2 2023-10-09 14:16:17,330 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2781296.0, ans=0.125 2023-10-09 14:16:17,435 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2781296.0, ans=0.2 2023-10-09 14:16:18,561 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2781296.0, ans=0.1 2023-10-09 14:16:49,842 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2781436.0, ans=10.0 2023-10-09 14:16:54,231 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2781436.0, ans=0.125 2023-10-09 14:17:03,083 INFO [train.py:1031] (2/4) Epoch 14, batch 11300, loss[loss=0.2175, simple_loss=0.2912, pruned_loss=0.05341, ctc_loss=0.09243, over 16778.00 frames. ], tot_loss[loss=0.2293, simple_loss=0.2886, pruned_loss=0.06296, ctc_loss=0.1101, over 3293059.25 frames. ], batch size: 188, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:17:05,567 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.55 vs. limit=10.0 2023-10-09 14:17:10,488 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2781482.6666666665, ans=0.1 2023-10-09 14:17:33,913 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.243e+02 3.091e+02 3.848e+02 4.979e+02 9.254e+02, threshold=7.696e+02, percent-clipped=6.0 2023-10-09 14:17:34,287 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2781576.0, ans=0.125 2023-10-09 14:17:37,651 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 14:17:40,621 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=2781622.6666666665, ans=0.05 2023-10-09 14:17:55,463 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2781669.3333333335, ans=0.0 2023-10-09 14:17:58,585 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2781669.3333333335, ans=0.1 2023-10-09 14:18:01,464 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2781669.3333333335, ans=0.1 2023-10-09 14:18:04,196 INFO [train.py:1031] (2/4) Epoch 14, batch 11350, loss[loss=0.2542, simple_loss=0.2946, pruned_loss=0.07962, ctc_loss=0.1363, over 16603.00 frames. ], tot_loss[loss=0.2267, simple_loss=0.2879, pruned_loss=0.06121, ctc_loss=0.1079, over 3297099.80 frames. ], batch size: 416, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:18:27,383 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2781762.6666666665, ans=0.125 2023-10-09 14:18:37,485 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.65 vs. limit=22.5 2023-10-09 14:18:48,128 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2781856.0, ans=0.0 2023-10-09 14:19:05,820 INFO [train.py:1031] (2/4) Epoch 14, batch 11400, loss[loss=0.2282, simple_loss=0.278, pruned_loss=0.06754, ctc_loss=0.1082, over 16793.00 frames. ], tot_loss[loss=0.2274, simple_loss=0.2857, pruned_loss=0.06257, ctc_loss=0.1101, over 3306634.99 frames. ], batch size: 111, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:19:06,183 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2781949.3333333335, ans=0.125 2023-10-09 14:19:14,716 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2781949.3333333335, ans=0.125 2023-10-09 14:19:15,816 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2781949.3333333335, ans=0.2 2023-10-09 14:19:29,850 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2782042.6666666665, ans=0.125 2023-10-09 14:19:34,982 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2782042.6666666665, ans=0.125 2023-10-09 14:19:37,884 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.484e+02 3.191e+02 3.489e+02 4.256e+02 5.952e+02, threshold=6.979e+02, percent-clipped=0.0 2023-10-09 14:19:45,581 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2782089.3333333335, ans=0.125 2023-10-09 14:20:00,399 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.90 vs. limit=10.0 2023-10-09 14:20:04,323 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2782136.0, ans=0.125 2023-10-09 14:20:07,765 INFO [train.py:1031] (2/4) Epoch 14, batch 11450, loss[loss=0.2402, simple_loss=0.2857, pruned_loss=0.07189, ctc_loss=0.1274, over 16963.00 frames. ], tot_loss[loss=0.2279, simple_loss=0.2832, pruned_loss=0.06386, ctc_loss=0.1122, over 3301599.16 frames. ], batch size: 228, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:20:24,176 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2782229.3333333335, ans=0.125 2023-10-09 14:20:26,923 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2782229.3333333335, ans=0.125 2023-10-09 14:20:38,236 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.44 vs. limit=12.0 2023-10-09 14:20:38,357 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.56 vs. limit=10.0 2023-10-09 14:21:08,868 INFO [train.py:1031] (2/4) Epoch 14, batch 11500, loss[loss=0.2267, simple_loss=0.2871, pruned_loss=0.06221, ctc_loss=0.1049, over 16905.00 frames. ], tot_loss[loss=0.2305, simple_loss=0.2842, pruned_loss=0.06552, ctc_loss=0.1147, over 3295332.89 frames. ], batch size: 258, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:21:12,020 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2782416.0, ans=0.0 2023-10-09 14:21:19,717 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=2782416.0, ans=0.5 2023-10-09 14:21:33,112 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.69 vs. limit=15.0 2023-10-09 14:21:42,410 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2782509.3333333335, ans=0.2 2023-10-09 14:21:44,090 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.522e+02 3.359e+02 3.820e+02 4.365e+02 7.019e+02, threshold=7.640e+02, percent-clipped=1.0 2023-10-09 14:21:45,145 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2782509.3333333335, ans=0.125 2023-10-09 14:21:53,230 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=2782556.0, ans=0.1 2023-10-09 14:22:06,220 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2782602.6666666665, ans=0.125 2023-10-09 14:22:11,637 INFO [train.py:1031] (2/4) Epoch 14, batch 11550, loss[loss=0.1918, simple_loss=0.2599, pruned_loss=0.04617, ctc_loss=0.07848, over 16567.00 frames. ], tot_loss[loss=0.2345, simple_loss=0.2875, pruned_loss=0.06722, ctc_loss=0.1179, over 3297844.43 frames. ], batch size: 110, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:22:21,846 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2782649.3333333335, ans=0.0 2023-10-09 14:22:31,174 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.41 vs. limit=10.0 2023-10-09 14:22:31,973 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.43 vs. limit=22.5 2023-10-09 14:22:51,851 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2782789.3333333335, ans=0.125 2023-10-09 14:23:15,895 INFO [train.py:1031] (2/4) Epoch 14, batch 11600, loss[loss=0.2791, simple_loss=0.3849, pruned_loss=0.06346, ctc_loss=0.1157, over 15002.00 frames. ], tot_loss[loss=0.2355, simple_loss=0.2926, pruned_loss=0.06599, ctc_loss=0.1163, over 3300556.82 frames. ], batch size: 525, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:23:16,386 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2782882.6666666665, ans=10.0 2023-10-09 14:23:24,869 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2782882.6666666665, ans=0.125 2023-10-09 14:23:37,039 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2782929.3333333335, ans=0.0 2023-10-09 14:23:52,932 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.208e+02 3.363e+02 4.073e+02 4.861e+02 8.872e+02, threshold=8.146e+02, percent-clipped=3.0 2023-10-09 14:23:54,385 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2783022.6666666665, ans=0.1 2023-10-09 14:24:16,976 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.48 vs. limit=15.0 2023-10-09 14:24:19,981 INFO [train.py:1031] (2/4) Epoch 14, batch 11650, loss[loss=0.2071, simple_loss=0.2727, pruned_loss=0.05258, ctc_loss=0.09102, over 16847.00 frames. ], tot_loss[loss=0.24, simple_loss=0.2979, pruned_loss=0.06726, ctc_loss=0.1188, over 3292306.76 frames. ], batch size: 228, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:24:35,860 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.16 vs. limit=15.0 2023-10-09 14:24:38,728 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.38 vs. limit=15.0 2023-10-09 14:24:43,463 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2783162.6666666665, ans=0.0 2023-10-09 14:25:02,810 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2783256.0, ans=0.0 2023-10-09 14:25:05,436 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2783256.0, ans=0.125 2023-10-09 14:25:23,310 INFO [train.py:1031] (2/4) Epoch 14, batch 11700, loss[loss=0.2156, simple_loss=0.2608, pruned_loss=0.06409, ctc_loss=0.1058, over 16737.00 frames. ], tot_loss[loss=0.2343, simple_loss=0.2907, pruned_loss=0.06583, ctc_loss=0.1158, over 3295223.59 frames. ], batch size: 151, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:25:26,807 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.35 vs. limit=12.0 2023-10-09 14:25:34,434 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2783396.0, ans=0.1 2023-10-09 14:25:36,132 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 14:25:37,491 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.33 vs. limit=22.5 2023-10-09 14:25:45,045 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=2783396.0, ans=10.0 2023-10-09 14:25:58,213 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.565e+02 3.451e+02 4.279e+02 5.142e+02 9.107e+02, threshold=8.558e+02, percent-clipped=4.0 2023-10-09 14:26:03,835 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2783489.3333333335, ans=0.0 2023-10-09 14:26:23,054 INFO [train.py:1031] (2/4) Epoch 14, batch 11750, loss[loss=0.1984, simple_loss=0.2519, pruned_loss=0.05368, ctc_loss=0.09374, over 16802.00 frames. ], tot_loss[loss=0.2278, simple_loss=0.2823, pruned_loss=0.06416, ctc_loss=0.1125, over 3301700.79 frames. ], batch size: 258, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:26:33,614 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2783582.6666666665, ans=0.125 2023-10-09 14:26:36,579 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2783629.3333333335, ans=0.05 2023-10-09 14:26:53,910 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.70 vs. limit=15.0 2023-10-09 14:27:17,016 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2783769.3333333335, ans=0.0 2023-10-09 14:27:24,343 INFO [train.py:1031] (2/4) Epoch 14, batch 11800, loss[loss=0.26, simple_loss=0.3069, pruned_loss=0.07812, ctc_loss=0.142, over 16440.00 frames. ], tot_loss[loss=0.2238, simple_loss=0.277, pruned_loss=0.06319, ctc_loss=0.1107, over 3303173.49 frames. ], batch size: 416, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:27:25,905 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2783816.0, ans=0.0 2023-10-09 14:27:26,964 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2783816.0, ans=0.2 2023-10-09 14:27:45,670 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.28 vs. limit=6.0 2023-10-09 14:27:48,402 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.56 vs. limit=22.5 2023-10-09 14:27:55,311 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 14:27:58,228 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2783909.3333333335, ans=0.2 2023-10-09 14:28:03,482 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.479e+02 3.032e+02 3.578e+02 4.296e+02 8.317e+02, threshold=7.156e+02, percent-clipped=0.0 2023-10-09 14:28:18,251 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2784002.6666666665, ans=0.0 2023-10-09 14:28:29,802 INFO [train.py:1031] (2/4) Epoch 14, batch 11850, loss[loss=0.3187, simple_loss=0.3913, pruned_loss=0.08932, ctc_loss=0.1685, over 16648.00 frames. ], tot_loss[loss=0.2269, simple_loss=0.2839, pruned_loss=0.06281, ctc_loss=0.1107, over 3290707.61 frames. ], batch size: 384, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:28:51,366 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=2784096.0, ans=6.0 2023-10-09 14:29:02,456 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2784142.6666666665, ans=0.125 2023-10-09 14:29:18,748 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2784189.3333333335, ans=0.125 2023-10-09 14:29:32,396 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2784282.6666666665, ans=0.125 2023-10-09 14:29:33,116 INFO [train.py:1031] (2/4) Epoch 14, batch 11900, loss[loss=0.2065, simple_loss=0.3075, pruned_loss=0.03838, ctc_loss=0.07189, over 16268.00 frames. ], tot_loss[loss=0.2277, simple_loss=0.2879, pruned_loss=0.06197, ctc_loss=0.1092, over 3288992.98 frames. ], batch size: 463, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:29:46,739 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2784329.3333333335, ans=0.0 2023-10-09 14:30:14,076 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.611e+02 3.208e+02 3.772e+02 4.590e+02 1.035e+03, threshold=7.543e+02, percent-clipped=4.0 2023-10-09 14:30:18,920 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2784422.6666666665, ans=0.125 2023-10-09 14:30:36,488 INFO [train.py:1031] (2/4) Epoch 14, batch 11950, loss[loss=0.2819, simple_loss=0.3222, pruned_loss=0.08884, ctc_loss=0.1598, over 16779.00 frames. ], tot_loss[loss=0.2318, simple_loss=0.2902, pruned_loss=0.06416, ctc_loss=0.1131, over 3293858.85 frames. ], batch size: 329, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:30:49,824 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2784562.6666666665, ans=10.0 2023-10-09 14:31:00,375 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2784562.6666666665, ans=0.0 2023-10-09 14:31:34,189 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.63 vs. limit=10.0 2023-10-09 14:31:39,536 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2784749.3333333335, ans=0.125 2023-10-09 14:31:40,164 INFO [train.py:1031] (2/4) Epoch 14, batch 12000, loss[loss=0.2233, simple_loss=0.3401, pruned_loss=0.03809, ctc_loss=0.07582, over 15137.00 frames. ], tot_loss[loss=0.2343, simple_loss=0.2936, pruned_loss=0.06459, ctc_loss=0.1144, over 3298875.28 frames. ], batch size: 527, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:31:40,164 INFO [train.py:1054] (2/4) Computing validation loss 2023-10-09 14:31:54,599 INFO [train.py:1063] (2/4) Epoch 14, validation: loss=0.2358, simple_loss=0.3055, pruned_loss=0.064, ctc_loss=0.09509, over 1796401.00 frames. 2023-10-09 14:31:54,600 INFO [train.py:1064] (2/4) Maximum memory allocated so far is 14591MB 2023-10-09 14:32:25,425 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2784842.6666666665, ans=0.1 2023-10-09 14:32:36,798 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.351e+02 3.474e+02 4.193e+02 5.077e+02 1.283e+03, threshold=8.386e+02, percent-clipped=9.0 2023-10-09 14:33:00,831 INFO [train.py:1031] (2/4) Epoch 14, batch 12050, loss[loss=0.2244, simple_loss=0.288, pruned_loss=0.06008, ctc_loss=0.1017, over 16883.00 frames. ], tot_loss[loss=0.2362, simple_loss=0.2958, pruned_loss=0.06538, ctc_loss=0.1145, over 3307095.73 frames. ], batch size: 228, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:33:09,756 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2784982.6666666665, ans=0.1 2023-10-09 14:33:12,861 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.06 vs. limit=15.0 2023-10-09 14:33:20,200 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2785029.3333333335, ans=0.0 2023-10-09 14:33:21,142 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2785029.3333333335, ans=0.0 2023-10-09 14:33:43,592 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.53 vs. limit=15.0 2023-10-09 14:33:50,474 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2785169.3333333335, ans=0.125 2023-10-09 14:33:56,396 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2785169.3333333335, ans=0.125 2023-10-09 14:33:56,723 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.48 vs. limit=6.0 2023-10-09 14:34:03,701 INFO [train.py:1031] (2/4) Epoch 14, batch 12100, loss[loss=0.2372, simple_loss=0.2795, pruned_loss=0.07327, ctc_loss=0.1209, over 16816.00 frames. ], tot_loss[loss=0.2364, simple_loss=0.2948, pruned_loss=0.06601, ctc_loss=0.1149, over 3298991.11 frames. ], batch size: 130, lr: 2.57e-03, grad_scale: 8.0 2023-10-09 14:34:06,078 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2785216.0, ans=0.0 2023-10-09 14:34:07,382 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.43 vs. limit=15.0 2023-10-09 14:34:11,479 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2785216.0, ans=0.125 2023-10-09 14:34:45,439 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.634e+02 3.559e+02 4.280e+02 5.187e+02 9.097e+02, threshold=8.560e+02, percent-clipped=2.0 2023-10-09 14:35:06,629 INFO [train.py:1031] (2/4) Epoch 14, batch 12150, loss[loss=0.218, simple_loss=0.2695, pruned_loss=0.06268, ctc_loss=0.1028, over 16789.00 frames. ], tot_loss[loss=0.2395, simple_loss=0.2961, pruned_loss=0.06776, ctc_loss=0.1186, over 3297207.11 frames. ], batch size: 130, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:35:07,042 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2785449.3333333335, ans=0.125 2023-10-09 14:35:39,613 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2785542.6666666665, ans=0.125 2023-10-09 14:35:59,812 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2785636.0, ans=10.0 2023-10-09 14:36:09,749 INFO [train.py:1031] (2/4) Epoch 14, batch 12200, loss[loss=0.2523, simple_loss=0.3672, pruned_loss=0.05023, ctc_loss=0.09222, over 15207.00 frames. ], tot_loss[loss=0.2507, simple_loss=0.3104, pruned_loss=0.07034, ctc_loss=0.1258, over 3299533.64 frames. ], batch size: 527, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:36:11,176 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.38 vs. limit=15.0 2023-10-09 14:36:13,610 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 14:36:29,499 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=2785729.3333333335, ans=0.1 2023-10-09 14:36:29,881 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=2785729.3333333335, ans=15.0 2023-10-09 14:36:30,555 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 14:36:32,612 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2785729.3333333335, ans=0.125 2023-10-09 14:36:37,986 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2785776.0, ans=0.1 2023-10-09 14:36:42,936 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.58 vs. limit=6.0 2023-10-09 14:36:52,409 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.227e+02 3.614e+02 4.622e+02 6.130e+02 1.347e+03, threshold=9.244e+02, percent-clipped=11.0 2023-10-09 14:37:12,097 INFO [train.py:1031] (2/4) Epoch 14, batch 12250, loss[loss=0.126, simple_loss=0.1773, pruned_loss=0.02809, ctc_loss=0.04643, over 11612.00 frames. ], tot_loss[loss=0.2425, simple_loss=0.3027, pruned_loss=0.06709, ctc_loss=0.1202, over 3282615.90 frames. ], batch size: 35, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:37:20,729 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2785916.0, ans=0.1 2023-10-09 14:37:28,090 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2785962.6666666665, ans=0.125 2023-10-09 14:37:46,226 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.45 vs. limit=15.0 2023-10-09 14:38:03,075 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.91 vs. limit=15.0 2023-10-09 14:38:12,140 INFO [train.py:1031] (2/4) Epoch 14, batch 12300, loss[loss=0.2194, simple_loss=0.2724, pruned_loss=0.05997, ctc_loss=0.1158, over 16714.00 frames. ], tot_loss[loss=0.2354, simple_loss=0.2927, pruned_loss=0.0657, ctc_loss=0.1171, over 3294435.13 frames. ], batch size: 309, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:38:39,997 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2786242.6666666665, ans=0.2 2023-10-09 14:38:43,356 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2786242.6666666665, ans=0.125 2023-10-09 14:38:47,759 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=2786289.3333333335, ans=15.0 2023-10-09 14:38:54,452 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2786289.3333333335, ans=0.2 2023-10-09 14:38:55,171 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.363e+02 3.070e+02 3.744e+02 4.897e+02 1.313e+03, threshold=7.488e+02, percent-clipped=1.0 2023-10-09 14:39:13,380 INFO [train.py:1031] (2/4) Epoch 14, batch 12350, loss[loss=0.2353, simple_loss=0.3212, pruned_loss=0.05424, ctc_loss=0.1024, over 16811.00 frames. ], tot_loss[loss=0.2354, simple_loss=0.2942, pruned_loss=0.06518, ctc_loss=0.1157, over 3299550.56 frames. ], batch size: 272, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:39:14,430 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2786382.6666666665, ans=0.2 2023-10-09 14:39:20,285 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2786382.6666666665, ans=0.1 2023-10-09 14:39:25,505 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2786429.3333333335, ans=0.1 2023-10-09 14:39:29,304 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.69 vs. limit=15.0 2023-10-09 14:39:49,191 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.51 vs. limit=12.0 2023-10-09 14:39:54,695 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2786522.6666666665, ans=0.125 2023-10-09 14:40:00,098 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=2786522.6666666665, ans=0.05 2023-10-09 14:40:03,329 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2786569.3333333335, ans=0.1 2023-10-09 14:40:14,932 INFO [train.py:1031] (2/4) Epoch 14, batch 12400, loss[loss=0.2306, simple_loss=0.3048, pruned_loss=0.05723, ctc_loss=0.1046, over 16862.00 frames. ], tot_loss[loss=0.2297, simple_loss=0.2905, pruned_loss=0.06218, ctc_loss=0.1111, over 3302789.57 frames. ], batch size: 243, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:40:33,561 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2786662.6666666665, ans=0.2 2023-10-09 14:40:48,665 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2786709.3333333335, ans=0.1 2023-10-09 14:41:00,376 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.428e+02 3.233e+02 3.612e+02 4.098e+02 6.929e+02, threshold=7.223e+02, percent-clipped=0.0 2023-10-09 14:41:07,874 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.61 vs. limit=15.0 2023-10-09 14:41:12,301 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2786802.6666666665, ans=0.035 2023-10-09 14:41:17,628 INFO [train.py:1031] (2/4) Epoch 14, batch 12450, loss[loss=0.281, simple_loss=0.2928, pruned_loss=0.1008, ctc_loss=0.169, over 10215.00 frames. ], tot_loss[loss=0.2275, simple_loss=0.2884, pruned_loss=0.06132, ctc_loss=0.1098, over 3289176.12 frames. ], batch size: 35, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:41:31,601 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2786896.0, ans=0.2 2023-10-09 14:41:49,875 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2786942.6666666665, ans=0.125 2023-10-09 14:41:58,729 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2786989.3333333335, ans=0.0 2023-10-09 14:42:10,682 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2787036.0, ans=0.0 2023-10-09 14:42:19,508 INFO [train.py:1031] (2/4) Epoch 14, batch 12500, loss[loss=0.1824, simple_loss=0.2595, pruned_loss=0.03895, ctc_loss=0.0684, over 16736.00 frames. ], tot_loss[loss=0.2242, simple_loss=0.2869, pruned_loss=0.05945, ctc_loss=0.1066, over 3295684.66 frames. ], batch size: 130, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:42:19,916 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2787082.6666666665, ans=0.125 2023-10-09 14:42:20,079 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.71 vs. limit=10.0 2023-10-09 14:42:32,164 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2787129.3333333335, ans=0.0 2023-10-09 14:42:41,254 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2787129.3333333335, ans=0.125 2023-10-09 14:42:55,238 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2787176.0, ans=0.125 2023-10-09 14:43:07,779 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2023-10-09 14:43:07,996 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.141e+02 2.957e+02 3.304e+02 4.556e+02 8.176e+02, threshold=6.608e+02, percent-clipped=1.0 2023-10-09 14:43:23,452 INFO [train.py:1031] (2/4) Epoch 14, batch 12550, loss[loss=0.2009, simple_loss=0.2646, pruned_loss=0.05115, ctc_loss=0.08749, over 16648.00 frames. ], tot_loss[loss=0.221, simple_loss=0.285, pruned_loss=0.05775, ctc_loss=0.1037, over 3294506.79 frames. ], batch size: 140, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:43:46,422 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2787409.3333333335, ans=0.1 2023-10-09 14:43:48,811 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.73 vs. limit=15.0 2023-10-09 14:44:21,741 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2787502.6666666665, ans=0.04949747468305833 2023-10-09 14:44:23,449 INFO [train.py:1031] (2/4) Epoch 14, batch 12600, loss[loss=0.2101, simple_loss=0.2664, pruned_loss=0.05659, ctc_loss=0.1014, over 16761.00 frames. ], tot_loss[loss=0.2167, simple_loss=0.2826, pruned_loss=0.05538, ctc_loss=0.1, over 3288257.61 frames. ], batch size: 140, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:44:25,418 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2787549.3333333335, ans=0.1 2023-10-09 14:44:39,888 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2787596.0, ans=0.125 2023-10-09 14:44:43,521 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2787596.0, ans=0.2 2023-10-09 14:44:48,893 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2787642.6666666665, ans=0.125 2023-10-09 14:45:11,285 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.045e+02 3.144e+02 3.491e+02 4.128e+02 9.398e+02, threshold=6.982e+02, percent-clipped=1.0 2023-10-09 14:45:15,440 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2787736.0, ans=0.0 2023-10-09 14:45:24,790 INFO [train.py:1031] (2/4) Epoch 14, batch 12650, loss[loss=0.2119, simple_loss=0.2648, pruned_loss=0.05885, ctc_loss=0.1031, over 16644.00 frames. ], tot_loss[loss=0.2196, simple_loss=0.2823, pruned_loss=0.05775, ctc_loss=0.1037, over 3298904.32 frames. ], batch size: 151, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:45:25,174 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2787782.6666666665, ans=0.125 2023-10-09 14:45:36,772 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2787829.3333333335, ans=0.125 2023-10-09 14:46:24,978 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2788016.0, ans=0.125 2023-10-09 14:46:26,369 INFO [train.py:1031] (2/4) Epoch 14, batch 12700, loss[loss=0.2153, simple_loss=0.2766, pruned_loss=0.05773, ctc_loss=0.09652, over 16963.00 frames. ], tot_loss[loss=0.2192, simple_loss=0.278, pruned_loss=0.05908, ctc_loss=0.1053, over 3304725.27 frames. ], batch size: 86, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:46:44,851 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.38 vs. limit=15.0 2023-10-09 14:46:48,327 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=5.22 vs. limit=12.0 2023-10-09 14:46:49,099 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2788062.6666666665, ans=0.125 2023-10-09 14:46:53,433 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2788109.3333333335, ans=0.125 2023-10-09 14:47:15,781 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.438e+02 3.456e+02 3.994e+02 4.833e+02 1.526e+03, threshold=7.989e+02, percent-clipped=4.0 2023-10-09 14:47:27,067 INFO [train.py:1031] (2/4) Epoch 14, batch 12750, loss[loss=0.3017, simple_loss=0.3416, pruned_loss=0.0952, ctc_loss=0.1783, over 16794.00 frames. ], tot_loss[loss=0.223, simple_loss=0.2786, pruned_loss=0.06174, ctc_loss=0.1097, over 3310536.75 frames. ], batch size: 309, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:48:03,651 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2788389.3333333335, ans=0.125 2023-10-09 14:48:12,540 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 14:48:21,976 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2788436.0, ans=0.0 2023-10-09 14:48:22,014 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2788436.0, ans=0.0 2023-10-09 14:48:23,041 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2788436.0, ans=0.125 2023-10-09 14:48:29,383 INFO [train.py:1031] (2/4) Epoch 14, batch 12800, loss[loss=0.2231, simple_loss=0.2983, pruned_loss=0.05453, ctc_loss=0.09689, over 16843.00 frames. ], tot_loss[loss=0.2309, simple_loss=0.2878, pruned_loss=0.0642, ctc_loss=0.114, over 3311739.39 frames. ], batch size: 228, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:48:29,647 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2788482.6666666665, ans=0.2 2023-10-09 14:48:30,211 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.70 vs. limit=15.0 2023-10-09 14:48:54,942 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2788576.0, ans=0.125 2023-10-09 14:49:18,037 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.556e+02 3.551e+02 3.934e+02 4.932e+02 8.018e+02, threshold=7.868e+02, percent-clipped=1.0 2023-10-09 14:49:20,629 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2788669.3333333335, ans=0.125 2023-10-09 14:49:29,017 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2788669.3333333335, ans=0.0 2023-10-09 14:49:30,704 INFO [train.py:1031] (2/4) Epoch 14, batch 12850, loss[loss=0.2191, simple_loss=0.2802, pruned_loss=0.05737, ctc_loss=0.1079, over 16709.00 frames. ], tot_loss[loss=0.2334, simple_loss=0.2917, pruned_loss=0.06466, ctc_loss=0.1145, over 3308713.92 frames. ], batch size: 140, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:49:37,206 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.60 vs. limit=15.0 2023-10-09 14:49:39,560 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2788716.0, ans=0.07 2023-10-09 14:49:46,236 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2788762.6666666665, ans=0.125 2023-10-09 14:49:51,151 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2788762.6666666665, ans=0.125 2023-10-09 14:50:11,788 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2788856.0, ans=0.1 2023-10-09 14:50:25,641 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 14:50:32,943 INFO [train.py:1031] (2/4) Epoch 14, batch 12900, loss[loss=0.3203, simple_loss=0.3845, pruned_loss=0.09462, ctc_loss=0.1673, over 16602.00 frames. ], tot_loss[loss=0.2413, simple_loss=0.2988, pruned_loss=0.06792, ctc_loss=0.1199, over 3308244.01 frames. ], batch size: 351, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:50:39,186 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2788949.3333333335, ans=0.07 2023-10-09 14:50:41,893 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2788949.3333333335, ans=0.0 2023-10-09 14:50:47,175 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2788996.0, ans=0.125 2023-10-09 14:50:48,917 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2788996.0, ans=0.125 2023-10-09 14:50:52,585 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 14:51:00,328 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2789042.6666666665, ans=0.125 2023-10-09 14:51:26,857 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.276e+02 3.447e+02 3.800e+02 4.409e+02 9.438e+02, threshold=7.600e+02, percent-clipped=3.0 2023-10-09 14:51:31,526 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2789136.0, ans=0.09899494936611666 2023-10-09 14:51:35,867 INFO [train.py:1031] (2/4) Epoch 14, batch 12950, loss[loss=0.1717, simple_loss=0.2439, pruned_loss=0.03662, ctc_loss=0.06574, over 16701.00 frames. ], tot_loss[loss=0.2367, simple_loss=0.299, pruned_loss=0.06429, ctc_loss=0.1145, over 3307175.29 frames. ], batch size: 130, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:51:37,222 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2789182.6666666665, ans=0.04949747468305833 2023-10-09 14:51:42,628 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2789182.6666666665, ans=0.0 2023-10-09 14:51:50,826 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.67 vs. limit=6.0 2023-10-09 14:51:59,253 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2789229.3333333335, ans=0.2 2023-10-09 14:52:09,398 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2789276.0, ans=0.0 2023-10-09 14:52:16,819 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2789322.6666666665, ans=0.125 2023-10-09 14:52:36,322 INFO [train.py:1031] (2/4) Epoch 14, batch 13000, loss[loss=0.1769, simple_loss=0.2372, pruned_loss=0.04362, ctc_loss=0.07342, over 16798.00 frames. ], tot_loss[loss=0.2286, simple_loss=0.2897, pruned_loss=0.06179, ctc_loss=0.1095, over 3300638.22 frames. ], batch size: 102, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:52:41,420 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2789416.0, ans=0.1 2023-10-09 14:52:56,099 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2789462.6666666665, ans=0.125 2023-10-09 14:53:09,111 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2789509.3333333335, ans=0.0 2023-10-09 14:53:14,249 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2023-10-09 14:53:20,392 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2789556.0, ans=0.5 2023-10-09 14:53:25,647 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2789602.6666666665, ans=0.125 2023-10-09 14:53:28,013 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.933e+02 2.825e+02 3.282e+02 3.972e+02 1.143e+03, threshold=6.563e+02, percent-clipped=1.0 2023-10-09 14:53:33,474 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.50 vs. limit=10.0 2023-10-09 14:53:36,421 INFO [train.py:1031] (2/4) Epoch 14, batch 13050, loss[loss=0.2068, simple_loss=0.2556, pruned_loss=0.05814, ctc_loss=0.1044, over 16774.00 frames. ], tot_loss[loss=0.2253, simple_loss=0.2833, pruned_loss=0.06177, ctc_loss=0.1094, over 3303883.74 frames. ], batch size: 188, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:53:51,786 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2789696.0, ans=0.0 2023-10-09 14:54:05,805 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2789742.6666666665, ans=0.0 2023-10-09 14:54:11,815 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2789789.3333333335, ans=0.125 2023-10-09 14:54:12,824 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2789789.3333333335, ans=0.2 2023-10-09 14:54:15,886 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.13 vs. limit=15.0 2023-10-09 14:54:16,663 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2789789.3333333335, ans=0.0 2023-10-09 14:54:17,859 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.20 vs. limit=15.0 2023-10-09 14:54:37,341 INFO [train.py:1031] (2/4) Epoch 14, batch 13100, loss[loss=0.2531, simple_loss=0.3146, pruned_loss=0.0693, ctc_loss=0.1325, over 16905.00 frames. ], tot_loss[loss=0.2273, simple_loss=0.2821, pruned_loss=0.06373, ctc_loss=0.1124, over 3299555.76 frames. ], batch size: 228, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:54:52,746 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2789929.3333333335, ans=0.0 2023-10-09 14:54:58,690 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2789929.3333333335, ans=0.1 2023-10-09 14:55:00,367 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2789929.3333333335, ans=0.125 2023-10-09 14:55:05,505 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2789976.0, ans=0.0 2023-10-09 14:55:08,454 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2789976.0, ans=0.05 2023-10-09 14:55:24,580 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.52 vs. limit=10.0 2023-10-09 14:55:32,623 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.553e+02 3.251e+02 4.048e+02 5.157e+02 1.010e+03, threshold=8.097e+02, percent-clipped=11.0 2023-10-09 14:55:42,104 INFO [train.py:1031] (2/4) Epoch 14, batch 13150, loss[loss=0.2585, simple_loss=0.3256, pruned_loss=0.0683, ctc_loss=0.1367, over 15178.00 frames. ], tot_loss[loss=0.2365, simple_loss=0.294, pruned_loss=0.06601, ctc_loss=0.1175, over 3301343.88 frames. ], batch size: 527, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:55:43,617 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2790116.0, ans=0.125 2023-10-09 14:55:44,613 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 14:55:47,911 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2790116.0, ans=0.1 2023-10-09 14:55:47,996 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2790116.0, ans=0.125 2023-10-09 14:55:58,726 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2790162.6666666665, ans=0.1 2023-10-09 14:56:09,205 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.67 vs. limit=22.5 2023-10-09 14:56:28,246 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2790256.0, ans=0.2 2023-10-09 14:56:36,735 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2790302.6666666665, ans=0.125 2023-10-09 14:56:36,801 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2790302.6666666665, ans=0.0 2023-10-09 14:56:38,576 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2790302.6666666665, ans=0.0 2023-10-09 14:56:45,812 INFO [train.py:1031] (2/4) Epoch 14, batch 13200, loss[loss=0.2545, simple_loss=0.3144, pruned_loss=0.07295, ctc_loss=0.1218, over 16705.00 frames. ], tot_loss[loss=0.2422, simple_loss=0.299, pruned_loss=0.06838, ctc_loss=0.1216, over 3295786.19 frames. ], batch size: 95, lr: 2.57e-03, grad_scale: 8.0 2023-10-09 14:56:56,370 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.09 vs. limit=15.0 2023-10-09 14:56:56,957 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2790396.0, ans=0.125 2023-10-09 14:57:06,088 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2790396.0, ans=0.0 2023-10-09 14:57:24,778 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 14:57:32,266 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2790489.3333333335, ans=0.125 2023-10-09 14:57:37,191 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2790536.0, ans=0.0 2023-10-09 14:57:41,601 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.305e+02 3.287e+02 3.760e+02 4.559e+02 7.411e+02, threshold=7.519e+02, percent-clipped=0.0 2023-10-09 14:57:43,594 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2790536.0, ans=0.1 2023-10-09 14:57:48,112 INFO [train.py:1031] (2/4) Epoch 14, batch 13250, loss[loss=0.1999, simple_loss=0.2541, pruned_loss=0.05306, ctc_loss=0.09908, over 16763.00 frames. ], tot_loss[loss=0.2414, simple_loss=0.2998, pruned_loss=0.06748, ctc_loss=0.1203, over 3303115.69 frames. ], batch size: 188, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 14:57:49,505 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2790582.6666666665, ans=0.09899494936611666 2023-10-09 14:57:57,368 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.87 vs. limit=15.0 2023-10-09 14:58:10,926 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2790629.3333333335, ans=0.2 2023-10-09 14:58:17,081 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.70 vs. limit=10.0 2023-10-09 14:58:36,718 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2790769.3333333335, ans=0.125 2023-10-09 14:58:49,206 INFO [train.py:1031] (2/4) Epoch 14, batch 13300, loss[loss=0.2204, simple_loss=0.2812, pruned_loss=0.05863, ctc_loss=0.1059, over 16807.00 frames. ], tot_loss[loss=0.2363, simple_loss=0.2918, pruned_loss=0.06668, ctc_loss=0.1184, over 3298421.60 frames. ], batch size: 228, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 14:58:49,617 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2790816.0, ans=0.1 2023-10-09 14:58:53,757 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2790816.0, ans=0.0 2023-10-09 14:59:14,729 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2790909.3333333335, ans=0.125 2023-10-09 14:59:42,182 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2791002.6666666665, ans=0.125 2023-10-09 14:59:47,826 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.520e+02 3.361e+02 3.791e+02 4.833e+02 1.183e+03, threshold=7.583e+02, percent-clipped=5.0 2023-10-09 14:59:50,989 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2791002.6666666665, ans=0.125 2023-10-09 14:59:52,781 INFO [train.py:1031] (2/4) Epoch 14, batch 13350, loss[loss=0.2057, simple_loss=0.2638, pruned_loss=0.05538, ctc_loss=0.09205, over 16648.00 frames. ], tot_loss[loss=0.2328, simple_loss=0.2895, pruned_loss=0.06502, ctc_loss=0.1152, over 3295583.27 frames. ], batch size: 151, lr: 2.57e-03, grad_scale: 1.0 2023-10-09 14:59:54,302 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2791049.3333333335, ans=0.2 2023-10-09 14:59:59,315 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2791049.3333333335, ans=0.125 2023-10-09 15:00:20,626 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.06 vs. limit=15.0 2023-10-09 15:00:35,445 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2791189.3333333335, ans=0.1 2023-10-09 15:00:41,037 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2791189.3333333335, ans=0.0 2023-10-09 15:00:50,582 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2791236.0, ans=0.0 2023-10-09 15:00:55,814 INFO [train.py:1031] (2/4) Epoch 14, batch 13400, loss[loss=0.2046, simple_loss=0.2521, pruned_loss=0.05797, ctc_loss=0.103, over 16713.00 frames. ], tot_loss[loss=0.2341, simple_loss=0.2912, pruned_loss=0.06558, ctc_loss=0.1147, over 3292697.23 frames. ], batch size: 102, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:00:59,035 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:01:16,754 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2791329.3333333335, ans=0.0 2023-10-09 15:01:23,652 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2791376.0, ans=0.125 2023-10-09 15:01:30,893 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2791376.0, ans=0.0 2023-10-09 15:01:31,925 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2791376.0, ans=0.125 2023-10-09 15:01:36,234 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2791422.6666666665, ans=0.0 2023-10-09 15:01:55,324 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.696e+02 3.467e+02 4.135e+02 5.221e+02 9.023e+02, threshold=8.270e+02, percent-clipped=2.0 2023-10-09 15:01:57,438 INFO [train.py:1031] (2/4) Epoch 14, batch 13450, loss[loss=0.1938, simple_loss=0.2475, pruned_loss=0.05281, ctc_loss=0.08619, over 16015.00 frames. ], tot_loss[loss=0.2302, simple_loss=0.2852, pruned_loss=0.06492, ctc_loss=0.1135, over 3302518.14 frames. ], batch size: 70, lr: 2.57e-03, grad_scale: 1.0 2023-10-09 15:02:26,960 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.96 vs. limit=22.5 2023-10-09 15:02:31,493 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2791609.3333333335, ans=0.0 2023-10-09 15:02:45,945 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2791702.6666666665, ans=0.1 2023-10-09 15:02:59,450 INFO [train.py:1031] (2/4) Epoch 14, batch 13500, loss[loss=0.1731, simple_loss=0.2459, pruned_loss=0.03715, ctc_loss=0.06526, over 16663.00 frames. ], tot_loss[loss=0.2245, simple_loss=0.2804, pruned_loss=0.06241, ctc_loss=0.1093, over 3304802.88 frames. ], batch size: 151, lr: 2.57e-03, grad_scale: 1.0 2023-10-09 15:03:23,439 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2791842.6666666665, ans=0.0 2023-10-09 15:03:31,214 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2791842.6666666665, ans=0.0 2023-10-09 15:03:45,805 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2791889.3333333335, ans=0.0 2023-10-09 15:03:53,532 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2791936.0, ans=0.1 2023-10-09 15:04:01,768 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.114e+02 2.985e+02 3.501e+02 4.657e+02 8.617e+02, threshold=7.002e+02, percent-clipped=1.0 2023-10-09 15:04:01,793 INFO [train.py:1031] (2/4) Epoch 14, batch 13550, loss[loss=0.2946, simple_loss=0.3547, pruned_loss=0.08882, ctc_loss=0.1425, over 16818.00 frames. ], tot_loss[loss=0.2249, simple_loss=0.2814, pruned_loss=0.06235, ctc_loss=0.109, over 3303587.20 frames. ], batch size: 95, lr: 2.57e-03, grad_scale: 0.5 2023-10-09 15:04:08,259 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2791982.6666666665, ans=0.0 2023-10-09 15:04:18,589 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2792029.3333333335, ans=0.1 2023-10-09 15:04:33,151 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:04:40,887 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2792122.6666666665, ans=0.125 2023-10-09 15:05:01,180 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.77 vs. limit=15.0 2023-10-09 15:05:05,325 INFO [train.py:1031] (2/4) Epoch 14, batch 13600, loss[loss=0.2843, simple_loss=0.3199, pruned_loss=0.09093, ctc_loss=0.1671, over 15244.00 frames. ], tot_loss[loss=0.2306, simple_loss=0.2869, pruned_loss=0.06455, ctc_loss=0.113, over 3306095.81 frames. ], batch size: 529, lr: 2.57e-03, grad_scale: 1.0 2023-10-09 15:05:10,018 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2792216.0, ans=0.125 2023-10-09 15:05:13,173 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2792216.0, ans=0.0 2023-10-09 15:05:19,799 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2792262.6666666665, ans=0.125 2023-10-09 15:05:20,809 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2792262.6666666665, ans=0.125 2023-10-09 15:05:23,488 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2792262.6666666665, ans=0.0 2023-10-09 15:05:35,939 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2792309.3333333335, ans=0.125 2023-10-09 15:05:42,505 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2792356.0, ans=0.125 2023-10-09 15:05:48,442 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2792356.0, ans=0.2 2023-10-09 15:05:52,209 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2792356.0, ans=0.07 2023-10-09 15:06:08,500 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.093e+02 3.317e+02 4.234e+02 5.653e+02 1.556e+03, threshold=8.468e+02, percent-clipped=11.0 2023-10-09 15:06:08,528 INFO [train.py:1031] (2/4) Epoch 14, batch 13650, loss[loss=0.2389, simple_loss=0.3199, pruned_loss=0.05917, ctc_loss=0.09914, over 16807.00 frames. ], tot_loss[loss=0.2268, simple_loss=0.2882, pruned_loss=0.06117, ctc_loss=0.1077, over 3303852.52 frames. ], batch size: 272, lr: 2.57e-03, grad_scale: 1.0 2023-10-09 15:06:15,723 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.64 vs. limit=15.0 2023-10-09 15:06:29,267 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2792496.0, ans=0.125 2023-10-09 15:06:39,674 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2792542.6666666665, ans=0.125 2023-10-09 15:07:11,262 INFO [train.py:1031] (2/4) Epoch 14, batch 13700, loss[loss=0.2573, simple_loss=0.3302, pruned_loss=0.06854, ctc_loss=0.1182, over 16888.00 frames. ], tot_loss[loss=0.2296, simple_loss=0.2927, pruned_loss=0.06145, ctc_loss=0.1089, over 3303374.31 frames. ], batch size: 292, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:07:24,106 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2792729.3333333335, ans=0.0 2023-10-09 15:07:32,777 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2792729.3333333335, ans=0.0 2023-10-09 15:07:49,495 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2792822.6666666665, ans=0.0 2023-10-09 15:08:01,181 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2792869.3333333335, ans=0.0 2023-10-09 15:08:15,042 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.984e+02 3.067e+02 3.765e+02 4.506e+02 1.005e+03, threshold=7.530e+02, percent-clipped=2.0 2023-10-09 15:08:15,069 INFO [train.py:1031] (2/4) Epoch 14, batch 13750, loss[loss=0.2167, simple_loss=0.2834, pruned_loss=0.05622, ctc_loss=0.09414, over 16679.00 frames. ], tot_loss[loss=0.2288, simple_loss=0.2941, pruned_loss=0.0602, ctc_loss=0.1076, over 3296064.25 frames. ], batch size: 151, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:08:29,664 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:08:40,931 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2793009.3333333335, ans=0.125 2023-10-09 15:08:47,263 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.28 vs. limit=15.0 2023-10-09 15:08:54,078 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2793056.0, ans=0.125 2023-10-09 15:08:59,874 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2793056.0, ans=0.2 2023-10-09 15:09:13,267 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2793102.6666666665, ans=0.0 2023-10-09 15:09:17,801 INFO [train.py:1031] (2/4) Epoch 14, batch 13800, loss[loss=0.3014, simple_loss=0.3515, pruned_loss=0.09323, ctc_loss=0.1621, over 16994.00 frames. ], tot_loss[loss=0.2348, simple_loss=0.2977, pruned_loss=0.06349, ctc_loss=0.1126, over 3297256.82 frames. ], batch size: 91, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:09:23,726 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2793149.3333333335, ans=0.0 2023-10-09 15:09:42,849 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2793242.6666666665, ans=0.125 2023-10-09 15:09:52,612 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.88 vs. limit=15.0 2023-10-09 15:10:00,971 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2793289.3333333335, ans=0.125 2023-10-09 15:10:11,699 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.01 vs. limit=6.0 2023-10-09 15:10:21,758 INFO [train.py:1031] (2/4) Epoch 14, batch 13850, loss[loss=0.1706, simple_loss=0.2299, pruned_loss=0.04164, ctc_loss=0.06983, over 16713.00 frames. ], tot_loss[loss=0.2314, simple_loss=0.2905, pruned_loss=0.06362, ctc_loss=0.1127, over 3297845.33 frames. ], batch size: 188, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:10:22,094 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2793382.6666666665, ans=0.0 2023-10-09 15:10:22,840 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.613e+02 3.299e+02 3.702e+02 4.236e+02 7.153e+02, threshold=7.404e+02, percent-clipped=0.0 2023-10-09 15:10:27,073 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2793382.6666666665, ans=0.2 2023-10-09 15:10:31,870 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2793382.6666666665, ans=0.04949747468305833 2023-10-09 15:10:49,590 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2793476.0, ans=0.125 2023-10-09 15:10:53,931 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2793476.0, ans=0.0 2023-10-09 15:11:03,433 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:11:03,444 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2793522.6666666665, ans=0.0 2023-10-09 15:11:25,336 INFO [train.py:1031] (2/4) Epoch 14, batch 13900, loss[loss=0.2162, simple_loss=0.293, pruned_loss=0.05181, ctc_loss=0.08937, over 16818.00 frames. ], tot_loss[loss=0.2295, simple_loss=0.2867, pruned_loss=0.06359, ctc_loss=0.1127, over 3301719.43 frames. ], batch size: 164, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:11:40,876 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2793662.6666666665, ans=0.125 2023-10-09 15:11:44,578 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2793662.6666666665, ans=0.125 2023-10-09 15:11:48,548 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:11:55,282 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2793709.3333333335, ans=0.0 2023-10-09 15:12:14,774 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2793802.6666666665, ans=0.0 2023-10-09 15:12:17,010 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2793802.6666666665, ans=0.125 2023-10-09 15:12:28,097 INFO [train.py:1031] (2/4) Epoch 14, batch 13950, loss[loss=0.2624, simple_loss=0.3195, pruned_loss=0.077, ctc_loss=0.1285, over 16863.00 frames. ], tot_loss[loss=0.2331, simple_loss=0.2937, pruned_loss=0.06367, ctc_loss=0.113, over 3288920.56 frames. ], batch size: 176, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:12:30,205 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.564e+02 3.293e+02 3.736e+02 4.752e+02 8.901e+02, threshold=7.472e+02, percent-clipped=3.0 2023-10-09 15:12:44,651 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2793896.0, ans=0.125 2023-10-09 15:12:56,900 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2793942.6666666665, ans=0.125 2023-10-09 15:12:57,988 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2793942.6666666665, ans=0.0 2023-10-09 15:13:01,270 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2793942.6666666665, ans=0.5 2023-10-09 15:13:03,867 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2793942.6666666665, ans=0.0 2023-10-09 15:13:06,623 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2793989.3333333335, ans=0.1 2023-10-09 15:13:19,389 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2794036.0, ans=0.125 2023-10-09 15:13:21,539 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.01 vs. limit=10.0 2023-10-09 15:13:22,578 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.13 vs. limit=15.0 2023-10-09 15:13:24,572 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2794036.0, ans=0.0 2023-10-09 15:13:26,720 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2794036.0, ans=0.125 2023-10-09 15:13:31,758 INFO [train.py:1031] (2/4) Epoch 14, batch 14000, loss[loss=0.2393, simple_loss=0.3012, pruned_loss=0.06566, ctc_loss=0.1151, over 16834.00 frames. ], tot_loss[loss=0.2382, simple_loss=0.2979, pruned_loss=0.06596, ctc_loss=0.1167, over 3296361.48 frames. ], batch size: 164, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:13:54,187 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2794129.3333333335, ans=0.1 2023-10-09 15:14:05,856 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2794176.0, ans=0.125 2023-10-09 15:14:09,116 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2794176.0, ans=0.125 2023-10-09 15:14:10,228 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2794222.6666666665, ans=0.125 2023-10-09 15:14:34,471 INFO [train.py:1031] (2/4) Epoch 14, batch 14050, loss[loss=0.2218, simple_loss=0.2715, pruned_loss=0.06169, ctc_loss=0.1218, over 16777.00 frames. ], tot_loss[loss=0.2366, simple_loss=0.2961, pruned_loss=0.06535, ctc_loss=0.1159, over 3298781.46 frames. ], batch size: 292, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:14:38,946 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.337e+02 3.126e+02 3.568e+02 4.195e+02 6.339e+02, threshold=7.137e+02, percent-clipped=0.0 2023-10-09 15:14:39,370 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2794316.0, ans=0.0 2023-10-09 15:14:50,026 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2794362.6666666665, ans=0.0 2023-10-09 15:15:29,012 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2794502.6666666665, ans=0.125 2023-10-09 15:15:37,073 INFO [train.py:1031] (2/4) Epoch 14, batch 14100, loss[loss=0.1965, simple_loss=0.2552, pruned_loss=0.05122, ctc_loss=0.08812, over 16766.00 frames. ], tot_loss[loss=0.2299, simple_loss=0.287, pruned_loss=0.06382, ctc_loss=0.1131, over 3279608.06 frames. ], batch size: 95, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:15:38,055 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2794549.3333333335, ans=0.0 2023-10-09 15:15:39,182 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2794549.3333333335, ans=0.125 2023-10-09 15:15:41,863 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.85 vs. limit=15.0 2023-10-09 15:15:51,803 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2794596.0, ans=0.125 2023-10-09 15:15:58,965 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2794596.0, ans=0.125 2023-10-09 15:16:07,582 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.03 vs. limit=12.0 2023-10-09 15:16:37,979 INFO [train.py:1031] (2/4) Epoch 14, batch 14150, loss[loss=0.2232, simple_loss=0.2432, pruned_loss=0.07529, ctc_loss=0.1314, over 16339.00 frames. ], tot_loss[loss=0.2249, simple_loss=0.2793, pruned_loss=0.06303, ctc_loss=0.1112, over 3276309.73 frames. ], batch size: 416, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:16:44,072 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.596e+02 3.061e+02 3.515e+02 4.416e+02 9.283e+02, threshold=7.030e+02, percent-clipped=2.0 2023-10-09 15:17:01,688 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2794829.3333333335, ans=0.0 2023-10-09 15:17:03,838 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2794876.0, ans=0.0 2023-10-09 15:17:23,171 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2794922.6666666665, ans=0.125 2023-10-09 15:17:39,323 INFO [train.py:1031] (2/4) Epoch 14, batch 14200, loss[loss=0.2019, simple_loss=0.2663, pruned_loss=0.05079, ctc_loss=0.08983, over 16624.00 frames. ], tot_loss[loss=0.2193, simple_loss=0.2749, pruned_loss=0.06046, ctc_loss=0.1071, over 3288276.93 frames. ], batch size: 151, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:17:40,177 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.63 vs. limit=15.0 2023-10-09 15:18:00,847 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2795062.6666666665, ans=0.1 2023-10-09 15:18:00,873 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2795062.6666666665, ans=0.025 2023-10-09 15:18:06,828 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2795109.3333333335, ans=0.125 2023-10-09 15:18:14,247 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2795109.3333333335, ans=0.125 2023-10-09 15:18:36,385 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2795202.6666666665, ans=0.0 2023-10-09 15:18:37,980 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.55 vs. limit=15.0 2023-10-09 15:18:43,118 INFO [train.py:1031] (2/4) Epoch 14, batch 14250, loss[loss=0.3084, simple_loss=0.3355, pruned_loss=0.1033, ctc_loss=0.187, over 16629.00 frames. ], tot_loss[loss=0.2234, simple_loss=0.2788, pruned_loss=0.06212, ctc_loss=0.1093, over 3289225.42 frames. ], batch size: 351, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:18:46,194 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2795249.3333333335, ans=0.1 2023-10-09 15:18:49,207 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.215e+02 2.880e+02 3.480e+02 3.927e+02 7.059e+02, threshold=6.960e+02, percent-clipped=1.0 2023-10-09 15:18:50,024 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.80 vs. limit=12.0 2023-10-09 15:19:07,489 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2795296.0, ans=0.125 2023-10-09 15:19:16,423 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2023-10-09 15:19:24,119 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2795389.3333333335, ans=0.125 2023-10-09 15:19:44,953 INFO [train.py:1031] (2/4) Epoch 14, batch 14300, loss[loss=0.2248, simple_loss=0.2606, pruned_loss=0.06807, ctc_loss=0.1319, over 15442.00 frames. ], tot_loss[loss=0.2284, simple_loss=0.2837, pruned_loss=0.06397, ctc_loss=0.1126, over 3291619.85 frames. ], batch size: 526, lr: 2.57e-03, grad_scale: 8.0 2023-10-09 15:19:54,426 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2795482.6666666665, ans=0.1 2023-10-09 15:19:58,536 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.43 vs. limit=15.0 2023-10-09 15:20:22,423 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2795622.6666666665, ans=0.0 2023-10-09 15:20:47,320 INFO [train.py:1031] (2/4) Epoch 14, batch 14350, loss[loss=0.2168, simple_loss=0.2773, pruned_loss=0.05684, ctc_loss=0.1067, over 16994.00 frames. ], tot_loss[loss=0.2288, simple_loss=0.2827, pruned_loss=0.06472, ctc_loss=0.1135, over 3305099.83 frames. ], batch size: 258, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:20:53,860 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.596e+02 3.121e+02 3.540e+02 4.017e+02 5.602e+02, threshold=7.080e+02, percent-clipped=0.0 2023-10-09 15:20:56,992 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2795716.0, ans=0.125 2023-10-09 15:21:13,223 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2795809.3333333335, ans=0.125 2023-10-09 15:21:22,595 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2795809.3333333335, ans=0.1 2023-10-09 15:21:30,859 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.54 vs. limit=15.0 2023-10-09 15:21:31,777 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2795856.0, ans=0.125 2023-10-09 15:21:37,949 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.56 vs. limit=15.0 2023-10-09 15:21:43,490 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2795902.6666666665, ans=0.125 2023-10-09 15:21:44,710 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2795902.6666666665, ans=0.125 2023-10-09 15:21:44,730 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2795902.6666666665, ans=0.0 2023-10-09 15:21:50,348 INFO [train.py:1031] (2/4) Epoch 14, batch 14400, loss[loss=0.198, simple_loss=0.2437, pruned_loss=0.05624, ctc_loss=0.09972, over 11281.00 frames. ], tot_loss[loss=0.228, simple_loss=0.2826, pruned_loss=0.06415, ctc_loss=0.1127, over 3305361.72 frames. ], batch size: 36, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:21:51,859 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2795949.3333333335, ans=0.025 2023-10-09 15:21:56,110 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.66 vs. limit=15.0 2023-10-09 15:22:11,612 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2795996.0, ans=0.2 2023-10-09 15:22:29,741 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2796089.3333333335, ans=0.125 2023-10-09 15:22:45,922 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2796136.0, ans=0.125 2023-10-09 15:22:53,855 INFO [train.py:1031] (2/4) Epoch 14, batch 14450, loss[loss=0.2285, simple_loss=0.2895, pruned_loss=0.06302, ctc_loss=0.104, over 16539.00 frames. ], tot_loss[loss=0.2326, simple_loss=0.2875, pruned_loss=0.06574, ctc_loss=0.1159, over 3314224.67 frames. ], batch size: 110, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:22:58,964 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2796182.6666666665, ans=0.0 2023-10-09 15:23:00,761 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.391e+02 3.304e+02 3.703e+02 4.462e+02 6.927e+02, threshold=7.405e+02, percent-clipped=0.0 2023-10-09 15:23:45,872 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.01 vs. limit=22.5 2023-10-09 15:23:48,740 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2796369.3333333335, ans=0.0 2023-10-09 15:23:54,673 INFO [train.py:1031] (2/4) Epoch 14, batch 14500, loss[loss=0.2427, simple_loss=0.2874, pruned_loss=0.07248, ctc_loss=0.1327, over 16809.00 frames. ], tot_loss[loss=0.2314, simple_loss=0.2872, pruned_loss=0.06487, ctc_loss=0.1144, over 3316456.85 frames. ], batch size: 309, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:24:14,711 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2796462.6666666665, ans=0.09899494936611666 2023-10-09 15:24:38,720 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=2796556.0, ans=15.0 2023-10-09 15:24:44,966 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2796602.6666666665, ans=0.0 2023-10-09 15:24:56,623 INFO [train.py:1031] (2/4) Epoch 14, batch 14550, loss[loss=0.1972, simple_loss=0.2436, pruned_loss=0.0554, ctc_loss=0.09993, over 16802.00 frames. ], tot_loss[loss=0.2268, simple_loss=0.2808, pruned_loss=0.0639, ctc_loss=0.1125, over 3304047.42 frames. ], batch size: 202, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:25:01,241 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.86 vs. limit=15.0 2023-10-09 15:25:02,959 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2796649.3333333335, ans=0.125 2023-10-09 15:25:05,844 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.462e+02 3.180e+02 3.818e+02 4.474e+02 1.185e+03, threshold=7.637e+02, percent-clipped=2.0 2023-10-09 15:25:07,255 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2796696.0, ans=0.125 2023-10-09 15:25:16,512 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2796696.0, ans=0.0 2023-10-09 15:25:20,234 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2796742.6666666665, ans=0.0 2023-10-09 15:25:34,485 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2796789.3333333335, ans=0.125 2023-10-09 15:25:34,828 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.19 vs. limit=15.0 2023-10-09 15:25:34,920 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.02 vs. limit=22.5 2023-10-09 15:25:36,188 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2796789.3333333335, ans=0.125 2023-10-09 15:25:51,935 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2796836.0, ans=0.025 2023-10-09 15:25:56,523 INFO [train.py:1031] (2/4) Epoch 14, batch 14600, loss[loss=0.224, simple_loss=0.2683, pruned_loss=0.06615, ctc_loss=0.1183, over 15309.00 frames. ], tot_loss[loss=0.2273, simple_loss=0.2814, pruned_loss=0.06404, ctc_loss=0.1127, over 3310687.86 frames. ], batch size: 526, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:25:56,877 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2796882.6666666665, ans=0.125 2023-10-09 15:26:07,411 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2796929.3333333335, ans=0.1 2023-10-09 15:26:37,280 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.36 vs. limit=15.0 2023-10-09 15:26:45,990 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2797069.3333333335, ans=0.0 2023-10-09 15:26:51,292 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2797069.3333333335, ans=0.125 2023-10-09 15:26:54,021 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2797069.3333333335, ans=0.125 2023-10-09 15:26:56,319 INFO [train.py:1031] (2/4) Epoch 14, batch 14650, loss[loss=0.2878, simple_loss=0.308, pruned_loss=0.09888, ctc_loss=0.1745, over 16873.00 frames. ], tot_loss[loss=0.2282, simple_loss=0.2818, pruned_loss=0.06465, ctc_loss=0.1133, over 3302932.20 frames. ], batch size: 384, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:27:05,751 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.397e+02 3.040e+02 3.470e+02 3.930e+02 6.552e+02, threshold=6.941e+02, percent-clipped=0.0 2023-10-09 15:27:10,815 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2023-10-09 15:27:32,868 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.42 vs. limit=12.0 2023-10-09 15:27:37,736 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.63 vs. limit=22.5 2023-10-09 15:27:41,430 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.02 vs. limit=15.0 2023-10-09 15:27:57,842 INFO [train.py:1031] (2/4) Epoch 14, batch 14700, loss[loss=0.2137, simple_loss=0.2586, pruned_loss=0.06224, ctc_loss=0.111, over 16791.00 frames. ], tot_loss[loss=0.2258, simple_loss=0.2785, pruned_loss=0.06411, ctc_loss=0.1124, over 3300837.87 frames. ], batch size: 121, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:28:00,372 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2797349.3333333335, ans=0.0 2023-10-09 15:28:09,058 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2797396.0, ans=0.07 2023-10-09 15:28:09,095 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2797396.0, ans=0.0 2023-10-09 15:28:15,628 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2797396.0, ans=0.1 2023-10-09 15:28:17,273 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.97 vs. limit=15.0 2023-10-09 15:28:41,520 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:28:50,100 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2797536.0, ans=0.0 2023-10-09 15:28:50,434 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.58 vs. limit=15.0 2023-10-09 15:28:59,995 INFO [train.py:1031] (2/4) Epoch 14, batch 14750, loss[loss=0.1875, simple_loss=0.243, pruned_loss=0.04916, ctc_loss=0.08403, over 16752.00 frames. ], tot_loss[loss=0.2215, simple_loss=0.2732, pruned_loss=0.06289, ctc_loss=0.1102, over 3303707.08 frames. ], batch size: 164, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:29:04,188 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2797582.6666666665, ans=0.1 2023-10-09 15:29:11,862 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.255e+02 3.078e+02 3.394e+02 3.991e+02 6.777e+02, threshold=6.787e+02, percent-clipped=0.0 2023-10-09 15:29:14,983 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2797629.3333333335, ans=0.125 2023-10-09 15:29:46,199 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2797722.6666666665, ans=0.0 2023-10-09 15:29:57,883 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:30:01,443 INFO [train.py:1031] (2/4) Epoch 14, batch 14800, loss[loss=0.2055, simple_loss=0.268, pruned_loss=0.05282, ctc_loss=0.0934, over 16860.00 frames. ], tot_loss[loss=0.2241, simple_loss=0.2758, pruned_loss=0.0638, ctc_loss=0.112, over 3296269.51 frames. ], batch size: 202, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:30:09,443 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2797816.0, ans=0.125 2023-10-09 15:30:28,455 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2797909.3333333335, ans=0.125 2023-10-09 15:31:05,100 INFO [train.py:1031] (2/4) Epoch 14, batch 14850, loss[loss=0.2261, simple_loss=0.271, pruned_loss=0.06849, ctc_loss=0.1106, over 16927.00 frames. ], tot_loss[loss=0.2269, simple_loss=0.2769, pruned_loss=0.06551, ctc_loss=0.1146, over 3300626.74 frames. ], batch size: 86, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:31:06,697 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.40 vs. limit=22.5 2023-10-09 15:31:09,067 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2798049.3333333335, ans=0.95 2023-10-09 15:31:14,410 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2798049.3333333335, ans=0.125 2023-10-09 15:31:16,850 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.615e+02 3.104e+02 3.584e+02 4.093e+02 5.889e+02, threshold=7.167e+02, percent-clipped=0.0 2023-10-09 15:31:19,570 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:31:39,007 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=2798142.6666666665, ans=15.0 2023-10-09 15:31:53,468 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2798189.3333333335, ans=0.0 2023-10-09 15:32:00,406 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2798236.0, ans=0.035 2023-10-09 15:32:08,274 INFO [train.py:1031] (2/4) Epoch 14, batch 14900, loss[loss=0.1952, simple_loss=0.253, pruned_loss=0.05027, ctc_loss=0.09184, over 16697.00 frames. ], tot_loss[loss=0.2235, simple_loss=0.2728, pruned_loss=0.06451, ctc_loss=0.1131, over 3291408.79 frames. ], batch size: 140, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:32:11,205 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.52 vs. limit=15.0 2023-10-09 15:32:26,825 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:32:28,245 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.39 vs. limit=15.0 2023-10-09 15:32:34,553 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2798376.0, ans=0.125 2023-10-09 15:32:34,575 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2798376.0, ans=0.0 2023-10-09 15:33:03,684 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2798469.3333333335, ans=0.0 2023-10-09 15:33:11,268 INFO [train.py:1031] (2/4) Epoch 14, batch 14950, loss[loss=0.2703, simple_loss=0.3038, pruned_loss=0.08706, ctc_loss=0.1568, over 16525.00 frames. ], tot_loss[loss=0.2221, simple_loss=0.2725, pruned_loss=0.06353, ctc_loss=0.1117, over 3293399.12 frames. ], batch size: 416, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:33:16,693 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.58 vs. limit=22.5 2023-10-09 15:33:22,494 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.44 vs. limit=15.0 2023-10-09 15:33:25,938 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.266e+02 3.074e+02 3.344e+02 3.882e+02 5.335e+02, threshold=6.688e+02, percent-clipped=0.0 2023-10-09 15:33:26,783 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.55 vs. limit=15.0 2023-10-09 15:33:28,992 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2798562.6666666665, ans=0.125 2023-10-09 15:33:42,044 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2798609.3333333335, ans=0.0 2023-10-09 15:34:02,631 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2798702.6666666665, ans=0.2 2023-10-09 15:34:06,941 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.65 vs. limit=15.0 2023-10-09 15:34:08,609 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2798702.6666666665, ans=0.125 2023-10-09 15:34:09,633 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2798702.6666666665, ans=0.1 2023-10-09 15:34:09,657 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2798702.6666666665, ans=0.125 2023-10-09 15:34:11,396 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2798702.6666666665, ans=0.125 2023-10-09 15:34:13,088 INFO [train.py:1031] (2/4) Epoch 14, batch 15000, loss[loss=0.1918, simple_loss=0.2482, pruned_loss=0.05001, ctc_loss=0.08874, over 16828.00 frames. ], tot_loss[loss=0.2238, simple_loss=0.2755, pruned_loss=0.06359, ctc_loss=0.1122, over 3288194.75 frames. ], batch size: 164, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:34:13,088 INFO [train.py:1054] (2/4) Computing validation loss 2023-10-09 15:34:21,709 INFO [zipformer.py:1853] (2/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.0106, 2.3342, 4.4402, 1.7741], device='cuda:2') 2023-10-09 15:34:29,055 INFO [zipformer.py:1853] (2/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([1.5001, 3.3420, 3.0317, 3.0529], device='cuda:2') 2023-10-09 15:34:29,425 INFO [train.py:1063] (2/4) Epoch 14, validation: loss=0.2384, simple_loss=0.3088, pruned_loss=0.06452, ctc_loss=0.09761, over 1796401.00 frames. 2023-10-09 15:34:29,426 INFO [train.py:1064] (2/4) Maximum memory allocated so far is 14591MB 2023-10-09 15:34:36,076 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=17.09 vs. limit=15.0 2023-10-09 15:34:56,587 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.97 vs. limit=10.0 2023-10-09 15:35:01,310 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:35:17,439 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2798889.3333333335, ans=0.125 2023-10-09 15:35:20,338 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2798936.0, ans=0.0 2023-10-09 15:35:21,490 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.77 vs. limit=10.0 2023-10-09 15:35:32,335 INFO [train.py:1031] (2/4) Epoch 14, batch 15050, loss[loss=0.2022, simple_loss=0.2491, pruned_loss=0.0585, ctc_loss=0.09586, over 16678.00 frames. ], tot_loss[loss=0.2193, simple_loss=0.2726, pruned_loss=0.06138, ctc_loss=0.1081, over 3284707.33 frames. ], batch size: 130, lr: 2.57e-03, grad_scale: 1.0 2023-10-09 15:35:36,583 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.05 vs. limit=12.0 2023-10-09 15:35:40,125 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2798982.6666666665, ans=0.125 2023-10-09 15:35:46,329 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.84 vs. limit=6.0 2023-10-09 15:35:49,234 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.412e+02 3.126e+02 3.487e+02 4.278e+02 6.504e+02, threshold=6.973e+02, percent-clipped=0.0 2023-10-09 15:35:52,368 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2799029.3333333335, ans=0.07 2023-10-09 15:35:56,080 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2799076.0, ans=0.125 2023-10-09 15:35:58,292 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2799076.0, ans=0.1 2023-10-09 15:35:59,396 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2799076.0, ans=0.09899494936611666 2023-10-09 15:36:07,468 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2799076.0, ans=0.05 2023-10-09 15:36:35,030 INFO [train.py:1031] (2/4) Epoch 14, batch 15100, loss[loss=0.247, simple_loss=0.2934, pruned_loss=0.07593, ctc_loss=0.1221, over 16757.00 frames. ], tot_loss[loss=0.2227, simple_loss=0.2765, pruned_loss=0.0626, ctc_loss=0.1092, over 3282359.11 frames. ], batch size: 130, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:37:09,033 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2799309.3333333335, ans=0.125 2023-10-09 15:37:17,852 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2799356.0, ans=0.0 2023-10-09 15:37:26,473 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2799402.6666666665, ans=0.125 2023-10-09 15:37:29,756 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2799402.6666666665, ans=0.0 2023-10-09 15:37:31,471 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2799402.6666666665, ans=0.09899494936611666 2023-10-09 15:37:37,649 INFO [train.py:1031] (2/4) Epoch 14, batch 15150, loss[loss=0.2697, simple_loss=0.3213, pruned_loss=0.08022, ctc_loss=0.1442, over 16741.00 frames. ], tot_loss[loss=0.2294, simple_loss=0.2833, pruned_loss=0.06507, ctc_loss=0.1134, over 3289326.95 frames. ], batch size: 328, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:37:41,047 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2799449.3333333335, ans=0.0 2023-10-09 15:37:53,167 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.05 vs. limit=22.5 2023-10-09 15:37:55,144 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.417e+02 3.319e+02 4.410e+02 5.242e+02 1.151e+03, threshold=8.819e+02, percent-clipped=3.0 2023-10-09 15:38:01,056 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.23 vs. limit=15.0 2023-10-09 15:38:15,950 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2799589.3333333335, ans=0.0 2023-10-09 15:38:16,985 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2799589.3333333335, ans=0.125 2023-10-09 15:38:17,008 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2799589.3333333335, ans=0.125 2023-10-09 15:38:30,046 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2799636.0, ans=0.125 2023-10-09 15:38:38,456 INFO [train.py:1031] (2/4) Epoch 14, batch 15200, loss[loss=0.2124, simple_loss=0.3046, pruned_loss=0.0441, ctc_loss=0.07999, over 15245.00 frames. ], tot_loss[loss=0.2259, simple_loss=0.2819, pruned_loss=0.06301, ctc_loss=0.1099, over 3284985.35 frames. ], batch size: 527, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:39:35,116 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2799869.3333333335, ans=0.125 2023-10-09 15:39:40,019 INFO [train.py:1031] (2/4) Epoch 14, batch 15250, loss[loss=0.2077, simple_loss=0.2773, pruned_loss=0.05213, ctc_loss=0.08464, over 16795.00 frames. ], tot_loss[loss=0.2212, simple_loss=0.2791, pruned_loss=0.06054, ctc_loss=0.1053, over 3287013.58 frames. ], batch size: 176, lr: 2.57e-03, grad_scale: 2.0 2023-10-09 15:39:47,982 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.35 vs. limit=15.0 2023-10-09 15:39:51,708 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2799962.6666666665, ans=0.0 2023-10-09 15:39:53,845 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2799962.6666666665, ans=0.125 2023-10-09 15:39:58,266 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+02 2.655e+02 2.982e+02 3.898e+02 5.868e+02, threshold=5.964e+02, percent-clipped=0.0 2023-10-09 15:40:42,819 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2800102.6666666665, ans=0.95 2023-10-09 15:40:44,710 INFO [train.py:1031] (2/4) Epoch 14, batch 15300, loss[loss=0.1905, simple_loss=0.2654, pruned_loss=0.04171, ctc_loss=0.0804, over 15168.00 frames. ], tot_loss[loss=0.2124, simple_loss=0.2744, pruned_loss=0.05572, ctc_loss=0.09743, over 3292537.11 frames. ], batch size: 527, lr: 2.57e-03, grad_scale: 4.0 2023-10-09 15:40:55,617 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2800149.3333333335, ans=0.125 2023-10-09 15:41:04,414 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2800196.0, ans=0.1 2023-10-09 15:41:11,015 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2800242.6666666665, ans=0.0 2023-10-09 15:41:15,075 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2800242.6666666665, ans=0.125 2023-10-09 15:41:24,193 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.65 vs. limit=15.0 2023-10-09 15:41:25,961 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2800289.3333333335, ans=0.2 2023-10-09 15:41:27,059 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:41:30,996 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2800289.3333333335, ans=0.1 2023-10-09 15:41:39,725 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.05 vs. limit=15.0 2023-10-09 15:41:45,455 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2800336.0, ans=0.125 2023-10-09 15:41:45,465 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2800336.0, ans=0.1 2023-10-09 15:41:45,467 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2800336.0, ans=0.0 2023-10-09 15:41:48,169 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2800382.6666666665, ans=0.125 2023-10-09 15:41:48,938 INFO [train.py:1031] (2/4) Epoch 14, batch 15350, loss[loss=0.2756, simple_loss=0.318, pruned_loss=0.08865, ctc_loss=0.1396, over 16674.00 frames. ], tot_loss[loss=0.2191, simple_loss=0.2793, pruned_loss=0.05887, ctc_loss=0.103, over 3290734.66 frames. ], batch size: 111, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 15:41:49,729 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.94 vs. limit=15.0 2023-10-09 15:41:53,942 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2800382.6666666665, ans=0.125 2023-10-09 15:41:55,350 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.63 vs. limit=10.0 2023-10-09 15:42:09,088 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+02 2.939e+02 3.401e+02 4.199e+02 7.970e+02, threshold=6.801e+02, percent-clipped=2.0 2023-10-09 15:42:09,475 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2800429.3333333335, ans=0.0 2023-10-09 15:42:33,982 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.12 vs. limit=15.0 2023-10-09 15:42:53,681 INFO [train.py:1031] (2/4) Epoch 14, batch 15400, loss[loss=0.1903, simple_loss=0.2548, pruned_loss=0.04635, ctc_loss=0.08266, over 16840.00 frames. ], tot_loss[loss=0.2232, simple_loss=0.2853, pruned_loss=0.05967, ctc_loss=0.1046, over 3293520.50 frames. ], batch size: 176, lr: 2.56e-03, grad_scale: 8.0 2023-10-09 15:42:54,411 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.91 vs. limit=10.0 2023-10-09 15:43:02,698 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2800616.0, ans=0.2 2023-10-09 15:43:05,081 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.71 vs. limit=15.0 2023-10-09 15:43:17,871 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2800709.3333333335, ans=0.0 2023-10-09 15:43:48,581 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2800802.6666666665, ans=0.125 2023-10-09 15:43:56,846 INFO [train.py:1031] (2/4) Epoch 14, batch 15450, loss[loss=0.2067, simple_loss=0.267, pruned_loss=0.05517, ctc_loss=0.08997, over 16907.00 frames. ], tot_loss[loss=0.218, simple_loss=0.2787, pruned_loss=0.05835, ctc_loss=0.1012, over 3292704.25 frames. ], batch size: 258, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 15:44:02,346 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2800849.3333333335, ans=0.125 2023-10-09 15:44:17,376 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.133e+02 3.271e+02 3.987e+02 5.026e+02 8.046e+02, threshold=7.973e+02, percent-clipped=4.0 2023-10-09 15:44:32,718 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2800942.6666666665, ans=0.0 2023-10-09 15:44:37,181 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2800989.3333333335, ans=0.0 2023-10-09 15:44:51,502 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2801036.0, ans=0.1 2023-10-09 15:44:57,027 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2801036.0, ans=0.125 2023-10-09 15:45:00,348 INFO [train.py:1031] (2/4) Epoch 14, batch 15500, loss[loss=0.2312, simple_loss=0.2962, pruned_loss=0.06225, ctc_loss=0.1042, over 16569.00 frames. ], tot_loss[loss=0.2156, simple_loss=0.2753, pruned_loss=0.058, ctc_loss=0.0996, over 3294989.14 frames. ], batch size: 351, lr: 2.56e-03, grad_scale: 8.0 2023-10-09 15:45:26,671 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2801176.0, ans=0.125 2023-10-09 15:45:29,816 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2801176.0, ans=0.125 2023-10-09 15:45:29,865 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:45:32,901 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2801176.0, ans=0.0 2023-10-09 15:45:59,988 INFO [train.py:1031] (2/4) Epoch 14, batch 15550, loss[loss=0.2746, simple_loss=0.3172, pruned_loss=0.08604, ctc_loss=0.1501, over 16716.00 frames. ], tot_loss[loss=0.2169, simple_loss=0.2755, pruned_loss=0.05901, ctc_loss=0.1008, over 3299048.84 frames. ], batch size: 328, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 15:46:06,057 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2801316.0, ans=0.125 2023-10-09 15:46:09,224 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2801316.0, ans=0.0 2023-10-09 15:46:14,892 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.83 vs. limit=6.0 2023-10-09 15:46:15,646 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2801362.6666666665, ans=0.1 2023-10-09 15:46:20,231 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:46:22,076 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.543e+02 3.223e+02 3.587e+02 4.203e+02 7.757e+02, threshold=7.174e+02, percent-clipped=0.0 2023-10-09 15:46:27,723 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2801409.3333333335, ans=0.05 2023-10-09 15:46:59,413 INFO [train.py:1031] (2/4) Epoch 14, batch 15600, loss[loss=0.2212, simple_loss=0.2954, pruned_loss=0.05314, ctc_loss=0.1017, over 16858.00 frames. ], tot_loss[loss=0.225, simple_loss=0.2822, pruned_loss=0.06247, ctc_loss=0.1071, over 3302380.86 frames. ], batch size: 242, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 15:47:07,823 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.04 vs. limit=15.0 2023-10-09 15:47:08,769 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.68 vs. limit=6.0 2023-10-09 15:47:20,850 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2801596.0, ans=0.125 2023-10-09 15:47:22,064 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2801596.0, ans=0.125 2023-10-09 15:47:35,008 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2801689.3333333335, ans=0.125 2023-10-09 15:47:46,232 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2801689.3333333335, ans=0.1 2023-10-09 15:47:48,287 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2801736.0, ans=0.125 2023-10-09 15:47:55,346 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.75 vs. limit=6.0 2023-10-09 15:47:55,461 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.99 vs. limit=15.0 2023-10-09 15:48:00,433 INFO [train.py:1031] (2/4) Epoch 14, batch 15650, loss[loss=0.2039, simple_loss=0.2549, pruned_loss=0.05712, ctc_loss=0.09653, over 16881.00 frames. ], tot_loss[loss=0.2233, simple_loss=0.2832, pruned_loss=0.06068, ctc_loss=0.1052, over 3309230.47 frames. ], batch size: 90, lr: 2.56e-03, grad_scale: 1.0 2023-10-09 15:48:07,246 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2801782.6666666665, ans=0.125 2023-10-09 15:48:17,980 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2801829.3333333335, ans=0.125 2023-10-09 15:48:23,318 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.292e+02 3.056e+02 3.461e+02 4.046e+02 6.916e+02, threshold=6.921e+02, percent-clipped=0.0 2023-10-09 15:48:23,740 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2801876.0, ans=0.125 2023-10-09 15:48:29,069 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2801876.0, ans=0.125 2023-10-09 15:48:31,346 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2801876.0, ans=0.0 2023-10-09 15:48:35,622 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2801922.6666666665, ans=0.1 2023-10-09 15:48:35,797 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=2801922.6666666665, ans=22.5 2023-10-09 15:49:00,033 INFO [train.py:1031] (2/4) Epoch 14, batch 15700, loss[loss=0.2351, simple_loss=0.2781, pruned_loss=0.07101, ctc_loss=0.1253, over 16736.00 frames. ], tot_loss[loss=0.2214, simple_loss=0.2782, pruned_loss=0.06112, ctc_loss=0.1059, over 3295411.93 frames. ], batch size: 328, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 15:49:03,530 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.97 vs. limit=12.0 2023-10-09 15:49:12,744 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2802062.6666666665, ans=0.0 2023-10-09 15:49:20,097 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2802062.6666666665, ans=0.1 2023-10-09 15:49:21,062 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2802062.6666666665, ans=0.0 2023-10-09 15:49:22,197 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2802062.6666666665, ans=0.0 2023-10-09 15:49:25,996 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2802109.3333333335, ans=0.125 2023-10-09 15:49:38,431 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2802156.0, ans=0.0 2023-10-09 15:49:42,206 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=2802156.0, ans=0.1 2023-10-09 15:49:52,669 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.80 vs. limit=22.5 2023-10-09 15:49:55,055 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.66 vs. limit=6.0 2023-10-09 15:49:56,751 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=2802202.6666666665, ans=15.0 2023-10-09 15:49:57,717 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=5.62 vs. limit=12.0 2023-10-09 15:50:01,939 INFO [train.py:1031] (2/4) Epoch 14, batch 15750, loss[loss=0.2069, simple_loss=0.2502, pruned_loss=0.06147, ctc_loss=0.1018, over 16821.00 frames. ], tot_loss[loss=0.2189, simple_loss=0.2731, pruned_loss=0.06117, ctc_loss=0.106, over 3298586.32 frames. ], batch size: 141, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 15:50:03,303 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2802249.3333333335, ans=0.125 2023-10-09 15:50:06,056 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2802249.3333333335, ans=0.125 2023-10-09 15:50:26,227 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.144e+02 3.013e+02 3.498e+02 4.173e+02 6.687e+02, threshold=6.996e+02, percent-clipped=0.0 2023-10-09 15:50:43,210 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2802389.3333333335, ans=0.07 2023-10-09 15:50:45,329 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2802389.3333333335, ans=0.2 2023-10-09 15:50:51,132 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2802436.0, ans=0.2 2023-10-09 15:50:53,891 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2802436.0, ans=0.1 2023-10-09 15:51:00,535 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2802436.0, ans=0.0 2023-10-09 15:51:01,604 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2802436.0, ans=0.2 2023-10-09 15:51:03,854 INFO [train.py:1031] (2/4) Epoch 14, batch 15800, loss[loss=0.1864, simple_loss=0.2895, pruned_loss=0.02931, ctc_loss=0.06183, over 15237.00 frames. ], tot_loss[loss=0.2146, simple_loss=0.2692, pruned_loss=0.05936, ctc_loss=0.1034, over 3288905.32 frames. ], batch size: 526, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 15:51:14,776 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2802482.6666666665, ans=0.125 2023-10-09 15:51:16,583 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2802529.3333333335, ans=0.0 2023-10-09 15:51:18,659 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2802529.3333333335, ans=0.125 2023-10-09 15:51:28,991 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=2802576.0, ans=0.2 2023-10-09 15:51:44,600 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:51:52,098 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 15:51:59,079 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2802669.3333333335, ans=0.125 2023-10-09 15:52:09,231 INFO [train.py:1031] (2/4) Epoch 14, batch 15850, loss[loss=0.2516, simple_loss=0.3134, pruned_loss=0.07136, ctc_loss=0.1178, over 16872.00 frames. ], tot_loss[loss=0.2141, simple_loss=0.2719, pruned_loss=0.05798, ctc_loss=0.1009, over 3292851.85 frames. ], batch size: 292, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 15:52:20,422 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2802716.0, ans=0.125 2023-10-09 15:52:32,819 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.96 vs. limit=15.0 2023-10-09 15:52:36,123 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.410e+02 3.156e+02 3.985e+02 5.059e+02 1.038e+03, threshold=7.970e+02, percent-clipped=10.0 2023-10-09 15:52:57,032 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2802856.0, ans=0.2 2023-10-09 15:52:58,066 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2802856.0, ans=0.125 2023-10-09 15:53:02,838 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2802902.6666666665, ans=0.1 2023-10-09 15:53:12,701 INFO [train.py:1031] (2/4) Epoch 14, batch 15900, loss[loss=0.2319, simple_loss=0.3282, pruned_loss=0.04988, ctc_loss=0.08976, over 15094.00 frames. ], tot_loss[loss=0.2136, simple_loss=0.2728, pruned_loss=0.05728, ctc_loss=0.09965, over 3294370.21 frames. ], batch size: 527, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 15:53:20,956 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.25 vs. limit=10.0 2023-10-09 15:53:29,216 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2802996.0, ans=0.2 2023-10-09 15:53:36,043 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.53 vs. limit=6.0 2023-10-09 15:53:51,633 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2803089.3333333335, ans=0.125 2023-10-09 15:53:57,500 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2803089.3333333335, ans=0.0 2023-10-09 15:54:14,355 INFO [train.py:1031] (2/4) Epoch 14, batch 15950, loss[loss=0.2136, simple_loss=0.2642, pruned_loss=0.0616, ctc_loss=0.09944, over 16854.00 frames. ], tot_loss[loss=0.2146, simple_loss=0.2727, pruned_loss=0.05802, ctc_loss=0.101, over 3292558.89 frames. ], batch size: 176, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 15:54:24,112 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2803182.6666666665, ans=0.0 2023-10-09 15:54:30,405 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2803229.3333333335, ans=0.125 2023-10-09 15:54:34,939 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2803229.3333333335, ans=0.0 2023-10-09 15:54:41,756 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.130e+02 3.015e+02 3.467e+02 4.153e+02 6.024e+02, threshold=6.935e+02, percent-clipped=0.0 2023-10-09 15:54:58,135 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2803322.6666666665, ans=0.125 2023-10-09 15:55:14,421 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2803369.3333333335, ans=0.0 2023-10-09 15:55:15,442 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=2803416.0, ans=0.05 2023-10-09 15:55:16,768 INFO [train.py:1031] (2/4) Epoch 14, batch 16000, loss[loss=0.2507, simple_loss=0.3013, pruned_loss=0.07407, ctc_loss=0.1299, over 16663.00 frames. ], tot_loss[loss=0.2246, simple_loss=0.2806, pruned_loss=0.06253, ctc_loss=0.1088, over 3296744.31 frames. ], batch size: 130, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 15:55:27,571 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2803416.0, ans=0.09899494936611666 2023-10-09 15:55:42,335 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=2803509.3333333335, ans=22.5 2023-10-09 15:56:13,088 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2803602.6666666665, ans=0.125 2023-10-09 15:56:19,092 INFO [train.py:1031] (2/4) Epoch 14, batch 16050, loss[loss=0.2022, simple_loss=0.3149, pruned_loss=0.03121, ctc_loss=0.06756, over 16245.00 frames. ], tot_loss[loss=0.2317, simple_loss=0.2904, pruned_loss=0.0639, ctc_loss=0.1128, over 3289713.32 frames. ], batch size: 463, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 15:56:23,204 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2803649.3333333335, ans=0.1 2023-10-09 15:56:27,586 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=2803649.3333333335, ans=0.1 2023-10-09 15:56:29,313 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2803649.3333333335, ans=0.0 2023-10-09 15:56:48,623 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.270e+02 3.378e+02 4.238e+02 4.995e+02 7.928e+02, threshold=8.476e+02, percent-clipped=3.0 2023-10-09 15:56:55,549 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2803789.3333333335, ans=0.125 2023-10-09 15:57:05,456 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.18 vs. limit=22.5 2023-10-09 15:57:18,651 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2803836.0, ans=0.125 2023-10-09 15:57:21,612 INFO [train.py:1031] (2/4) Epoch 14, batch 16100, loss[loss=0.2368, simple_loss=0.2933, pruned_loss=0.06622, ctc_loss=0.1194, over 16874.00 frames. ], tot_loss[loss=0.2317, simple_loss=0.2917, pruned_loss=0.06338, ctc_loss=0.1125, over 3292114.38 frames. ], batch size: 242, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 15:57:23,092 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2803882.6666666665, ans=0.125 2023-10-09 15:57:58,749 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2804022.6666666665, ans=0.125 2023-10-09 15:58:05,154 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2804022.6666666665, ans=0.125 2023-10-09 15:58:14,360 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.68 vs. limit=10.0 2023-10-09 15:58:23,845 INFO [train.py:1031] (2/4) Epoch 14, batch 16150, loss[loss=0.2771, simple_loss=0.3287, pruned_loss=0.08342, ctc_loss=0.1467, over 16820.00 frames. ], tot_loss[loss=0.2351, simple_loss=0.2935, pruned_loss=0.06528, ctc_loss=0.1154, over 3296038.31 frames. ], batch size: 329, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 15:58:29,701 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2804116.0, ans=0.125 2023-10-09 15:58:30,841 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2804116.0, ans=0.125 2023-10-09 15:58:34,998 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2804162.6666666665, ans=0.125 2023-10-09 15:58:38,193 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.86 vs. limit=15.0 2023-10-09 15:58:53,759 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.282e+02 3.176e+02 3.660e+02 4.435e+02 1.361e+03, threshold=7.321e+02, percent-clipped=1.0 2023-10-09 15:58:54,565 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.45 vs. limit=12.0 2023-10-09 15:59:05,955 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.58 vs. limit=15.0 2023-10-09 15:59:11,526 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2804302.6666666665, ans=0.125 2023-10-09 15:59:24,055 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2804349.3333333335, ans=0.125 2023-10-09 15:59:24,218 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2804349.3333333335, ans=0.0 2023-10-09 15:59:24,939 INFO [train.py:1031] (2/4) Epoch 14, batch 16200, loss[loss=0.1902, simple_loss=0.2457, pruned_loss=0.05007, ctc_loss=0.08629, over 16822.00 frames. ], tot_loss[loss=0.2302, simple_loss=0.2878, pruned_loss=0.0638, ctc_loss=0.1126, over 3301212.66 frames. ], batch size: 243, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 15:59:28,750 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2804349.3333333335, ans=0.0 2023-10-09 15:59:42,603 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2804396.0, ans=0.1 2023-10-09 15:59:49,868 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.56 vs. limit=10.0 2023-10-09 16:00:08,100 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2804489.3333333335, ans=0.0 2023-10-09 16:00:18,019 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2804536.0, ans=0.125 2023-10-09 16:00:27,729 INFO [train.py:1031] (2/4) Epoch 14, batch 16250, loss[loss=0.2479, simple_loss=0.3229, pruned_loss=0.06353, ctc_loss=0.1144, over 16637.00 frames. ], tot_loss[loss=0.2253, simple_loss=0.2818, pruned_loss=0.06239, ctc_loss=0.1102, over 3309972.30 frames. ], batch size: 384, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:00:58,613 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.213e+02 3.037e+02 3.428e+02 4.095e+02 1.009e+03, threshold=6.855e+02, percent-clipped=2.0 2023-10-09 16:01:30,634 INFO [train.py:1031] (2/4) Epoch 14, batch 16300, loss[loss=0.1956, simple_loss=0.255, pruned_loss=0.04985, ctc_loss=0.09107, over 16078.00 frames. ], tot_loss[loss=0.2207, simple_loss=0.2801, pruned_loss=0.05953, ctc_loss=0.1055, over 3313197.16 frames. ], batch size: 463, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:01:41,583 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2804862.6666666665, ans=0.0 2023-10-09 16:01:54,948 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 16:02:09,157 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2804956.0, ans=0.0 2023-10-09 16:02:18,844 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.10 vs. limit=15.0 2023-10-09 16:02:29,000 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2805002.6666666665, ans=0.125 2023-10-09 16:02:30,658 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2805049.3333333335, ans=0.125 2023-10-09 16:02:31,524 INFO [train.py:1031] (2/4) Epoch 14, batch 16350, loss[loss=0.2143, simple_loss=0.2768, pruned_loss=0.05585, ctc_loss=0.1006, over 16818.00 frames. ], tot_loss[loss=0.2169, simple_loss=0.2752, pruned_loss=0.05859, ctc_loss=0.1034, over 3307069.96 frames. ], batch size: 202, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:02:40,261 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2805049.3333333335, ans=0.2 2023-10-09 16:02:41,822 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2805049.3333333335, ans=0.125 2023-10-09 16:02:48,581 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2805096.0, ans=0.125 2023-10-09 16:03:01,698 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.282e+02 3.121e+02 3.548e+02 4.178e+02 8.324e+02, threshold=7.096e+02, percent-clipped=2.0 2023-10-09 16:03:04,799 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2805142.6666666665, ans=0.125 2023-10-09 16:03:13,347 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2805189.3333333335, ans=0.125 2023-10-09 16:03:19,311 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2805236.0, ans=0.1 2023-10-09 16:03:32,990 INFO [train.py:1031] (2/4) Epoch 14, batch 16400, loss[loss=0.206, simple_loss=0.2739, pruned_loss=0.05135, ctc_loss=0.08841, over 16821.00 frames. ], tot_loss[loss=0.2191, simple_loss=0.2758, pruned_loss=0.06003, ctc_loss=0.1058, over 3311875.91 frames. ], batch size: 95, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:03:39,517 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2805282.6666666665, ans=0.125 2023-10-09 16:03:39,577 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2805282.6666666665, ans=0.2 2023-10-09 16:03:41,687 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2805282.6666666665, ans=0.0 2023-10-09 16:03:47,587 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2805329.3333333335, ans=0.125 2023-10-09 16:03:56,728 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2805376.0, ans=0.125 2023-10-09 16:04:13,554 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=2805422.6666666665, ans=15.0 2023-10-09 16:04:15,476 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2805422.6666666665, ans=0.0 2023-10-09 16:04:34,782 INFO [train.py:1031] (2/4) Epoch 14, batch 16450, loss[loss=0.2134, simple_loss=0.259, pruned_loss=0.06201, ctc_loss=0.1094, over 16806.00 frames. ], tot_loss[loss=0.2206, simple_loss=0.275, pruned_loss=0.06154, ctc_loss=0.1081, over 3318789.50 frames. ], batch size: 310, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:04:35,099 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2805516.0, ans=0.0 2023-10-09 16:04:40,347 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2805516.0, ans=0.2 2023-10-09 16:04:52,975 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2805562.6666666665, ans=0.125 2023-10-09 16:05:06,536 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.659e+02 3.324e+02 3.650e+02 4.238e+02 1.011e+03, threshold=7.299e+02, percent-clipped=2.0 2023-10-09 16:05:35,698 INFO [train.py:1031] (2/4) Epoch 14, batch 16500, loss[loss=0.1801, simple_loss=0.228, pruned_loss=0.04939, ctc_loss=0.08367, over 16527.00 frames. ], tot_loss[loss=0.2185, simple_loss=0.2707, pruned_loss=0.06158, ctc_loss=0.1081, over 3310702.38 frames. ], batch size: 110, lr: 2.56e-03, grad_scale: 8.0 2023-10-09 16:05:59,457 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2805842.6666666665, ans=0.0 2023-10-09 16:06:02,166 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2805842.6666666665, ans=0.0 2023-10-09 16:06:14,870 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2805889.3333333335, ans=0.0 2023-10-09 16:06:16,659 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.91 vs. limit=12.0 2023-10-09 16:06:22,218 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2805889.3333333335, ans=0.125 2023-10-09 16:06:23,357 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2805936.0, ans=0.2 2023-10-09 16:06:29,688 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2805936.0, ans=0.125 2023-10-09 16:06:32,462 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2805936.0, ans=0.125 2023-10-09 16:06:35,199 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2805936.0, ans=0.0 2023-10-09 16:06:37,060 INFO [train.py:1031] (2/4) Epoch 14, batch 16550, loss[loss=0.1969, simple_loss=0.2621, pruned_loss=0.04856, ctc_loss=0.08621, over 16748.00 frames. ], tot_loss[loss=0.2184, simple_loss=0.2719, pruned_loss=0.06102, ctc_loss=0.1071, over 3291533.32 frames. ], batch size: 228, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:06:37,432 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2805982.6666666665, ans=0.125 2023-10-09 16:06:42,193 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2805982.6666666665, ans=0.0 2023-10-09 16:07:01,017 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2806076.0, ans=0.125 2023-10-09 16:07:09,904 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.507e+02 3.011e+02 3.365e+02 4.120e+02 6.132e+02, threshold=6.730e+02, percent-clipped=0.0 2023-10-09 16:07:23,816 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2806122.6666666665, ans=0.125 2023-10-09 16:07:25,797 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2806169.3333333335, ans=0.0 2023-10-09 16:07:37,224 INFO [train.py:1031] (2/4) Epoch 14, batch 16600, loss[loss=0.1954, simple_loss=0.245, pruned_loss=0.055, ctc_loss=0.08932, over 16921.00 frames. ], tot_loss[loss=0.2171, simple_loss=0.2687, pruned_loss=0.06129, ctc_loss=0.1072, over 3298558.57 frames. ], batch size: 90, lr: 2.56e-03, grad_scale: 8.0 2023-10-09 16:07:42,391 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2806216.0, ans=0.125 2023-10-09 16:07:46,295 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2806216.0, ans=0.125 2023-10-09 16:07:47,418 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=2806216.0, ans=0.2 2023-10-09 16:07:58,080 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2806262.6666666665, ans=10.0 2023-10-09 16:08:12,106 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.83 vs. limit=10.0 2023-10-09 16:08:12,886 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2806356.0, ans=0.1 2023-10-09 16:08:39,071 INFO [train.py:1031] (2/4) Epoch 14, batch 16650, loss[loss=0.1724, simple_loss=0.238, pruned_loss=0.03927, ctc_loss=0.07037, over 16927.00 frames. ], tot_loss[loss=0.2165, simple_loss=0.2708, pruned_loss=0.06007, ctc_loss=0.1055, over 3302225.35 frames. ], batch size: 90, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:08:57,087 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2806496.0, ans=0.125 2023-10-09 16:08:57,642 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.07 vs. limit=22.5 2023-10-09 16:09:00,360 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.52 vs. limit=15.0 2023-10-09 16:09:03,236 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2806542.6666666665, ans=0.125 2023-10-09 16:09:15,076 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.506e+02 2.878e+02 3.292e+02 3.921e+02 8.519e+02, threshold=6.584e+02, percent-clipped=3.0 2023-10-09 16:09:25,790 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2806589.3333333335, ans=0.125 2023-10-09 16:09:36,557 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2806636.0, ans=0.125 2023-10-09 16:09:40,523 INFO [train.py:1031] (2/4) Epoch 14, batch 16700, loss[loss=0.2112, simple_loss=0.2611, pruned_loss=0.06137, ctc_loss=0.09621, over 16952.00 frames. ], tot_loss[loss=0.2159, simple_loss=0.2681, pruned_loss=0.06063, ctc_loss=0.106, over 3304209.01 frames. ], batch size: 78, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:09:45,764 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2806682.6666666665, ans=0.2 2023-10-09 16:09:55,945 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.67 vs. limit=12.0 2023-10-09 16:10:01,552 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2806729.3333333335, ans=0.125 2023-10-09 16:10:01,625 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2806729.3333333335, ans=0.125 2023-10-09 16:10:02,771 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2806729.3333333335, ans=0.0 2023-10-09 16:10:08,549 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2806776.0, ans=0.0 2023-10-09 16:10:21,963 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2806822.6666666665, ans=0.125 2023-10-09 16:10:23,064 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2806822.6666666665, ans=0.0 2023-10-09 16:10:29,173 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2806869.3333333335, ans=0.125 2023-10-09 16:10:42,243 INFO [train.py:1031] (2/4) Epoch 14, batch 16750, loss[loss=0.2111, simple_loss=0.281, pruned_loss=0.05208, ctc_loss=0.09262, over 16851.00 frames. ], tot_loss[loss=0.2158, simple_loss=0.2678, pruned_loss=0.06065, ctc_loss=0.106, over 3311486.55 frames. ], batch size: 228, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:10:55,905 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2806962.6666666665, ans=0.0 2023-10-09 16:11:07,361 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.91 vs. limit=12.0 2023-10-09 16:11:18,573 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.229e+02 3.049e+02 3.546e+02 4.303e+02 6.611e+02, threshold=7.093e+02, percent-clipped=1.0 2023-10-09 16:11:40,147 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2807102.6666666665, ans=0.025 2023-10-09 16:11:42,923 INFO [train.py:1031] (2/4) Epoch 14, batch 16800, loss[loss=0.2467, simple_loss=0.2883, pruned_loss=0.07536, ctc_loss=0.1358, over 16631.00 frames. ], tot_loss[loss=0.217, simple_loss=0.2708, pruned_loss=0.06045, ctc_loss=0.1057, over 3321803.76 frames. ], batch size: 351, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:12:45,118 INFO [train.py:1031] (2/4) Epoch 14, batch 16850, loss[loss=0.2145, simple_loss=0.2621, pruned_loss=0.06272, ctc_loss=0.1036, over 16587.00 frames. ], tot_loss[loss=0.2185, simple_loss=0.2716, pruned_loss=0.06122, ctc_loss=0.1072, over 3309538.41 frames. ], batch size: 110, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:13:03,394 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2807429.3333333335, ans=0.125 2023-10-09 16:13:18,377 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2807476.0, ans=0.125 2023-10-09 16:13:21,008 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2807476.0, ans=0.1 2023-10-09 16:13:23,167 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2807522.6666666665, ans=0.125 2023-10-09 16:13:24,926 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.517e+02 3.198e+02 3.748e+02 4.342e+02 8.032e+02, threshold=7.496e+02, percent-clipped=3.0 2023-10-09 16:13:27,081 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2807522.6666666665, ans=0.125 2023-10-09 16:13:28,023 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2807522.6666666665, ans=0.125 2023-10-09 16:13:30,890 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2807522.6666666665, ans=0.0 2023-10-09 16:13:35,098 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.31 vs. limit=10.0 2023-10-09 16:13:38,350 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2807569.3333333335, ans=0.0 2023-10-09 16:13:39,761 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.68 vs. limit=12.0 2023-10-09 16:13:48,642 INFO [train.py:1031] (2/4) Epoch 14, batch 16900, loss[loss=0.2259, simple_loss=0.2869, pruned_loss=0.06146, ctc_loss=0.1053, over 16851.00 frames. ], tot_loss[loss=0.219, simple_loss=0.2745, pruned_loss=0.06047, ctc_loss=0.1063, over 3301163.02 frames. ], batch size: 242, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:13:50,072 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 16:13:52,833 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2807616.0, ans=0.1 2023-10-09 16:13:58,351 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.78 vs. limit=15.0 2023-10-09 16:14:00,670 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2807662.6666666665, ans=0.2 2023-10-09 16:14:39,249 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2807802.6666666665, ans=0.09899494936611666 2023-10-09 16:14:40,225 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2807802.6666666665, ans=0.035 2023-10-09 16:14:51,579 INFO [train.py:1031] (2/4) Epoch 14, batch 16950, loss[loss=0.2398, simple_loss=0.2755, pruned_loss=0.07481, ctc_loss=0.1363, over 15244.00 frames. ], tot_loss[loss=0.2243, simple_loss=0.2801, pruned_loss=0.06231, ctc_loss=0.1097, over 3304448.05 frames. ], batch size: 527, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:15:12,751 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2807896.0, ans=0.0 2023-10-09 16:15:12,811 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2807896.0, ans=0.125 2023-10-09 16:15:14,900 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.36 vs. limit=15.0 2023-10-09 16:15:25,368 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2807942.6666666665, ans=0.2 2023-10-09 16:15:33,196 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.515e+02 3.296e+02 3.627e+02 4.465e+02 8.431e+02, threshold=7.254e+02, percent-clipped=3.0 2023-10-09 16:15:35,632 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2807989.3333333335, ans=0.0 2023-10-09 16:15:46,002 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.90 vs. limit=22.5 2023-10-09 16:15:52,890 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2808036.0, ans=0.07 2023-10-09 16:15:55,743 INFO [train.py:1031] (2/4) Epoch 14, batch 17000, loss[loss=0.2506, simple_loss=0.319, pruned_loss=0.06663, ctc_loss=0.1224, over 16555.00 frames. ], tot_loss[loss=0.2281, simple_loss=0.2845, pruned_loss=0.06341, ctc_loss=0.112, over 3306572.16 frames. ], batch size: 351, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:15:58,837 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2808082.6666666665, ans=0.1 2023-10-09 16:16:01,558 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2808082.6666666665, ans=0.125 2023-10-09 16:16:10,394 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2808129.3333333335, ans=0.125 2023-10-09 16:16:41,010 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2808222.6666666665, ans=0.125 2023-10-09 16:16:41,087 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2808222.6666666665, ans=0.2 2023-10-09 16:16:51,936 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2808269.3333333335, ans=0.0 2023-10-09 16:16:53,089 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2808269.3333333335, ans=0.125 2023-10-09 16:16:59,296 INFO [train.py:1031] (2/4) Epoch 14, batch 17050, loss[loss=0.1993, simple_loss=0.2899, pruned_loss=0.03882, ctc_loss=0.07759, over 15112.00 frames. ], tot_loss[loss=0.2253, simple_loss=0.2828, pruned_loss=0.06199, ctc_loss=0.1095, over 3301076.80 frames. ], batch size: 526, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:16:59,556 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2808316.0, ans=0.125 2023-10-09 16:17:09,483 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2808316.0, ans=0.125 2023-10-09 16:17:13,234 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2808362.6666666665, ans=0.125 2023-10-09 16:17:17,543 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2808362.6666666665, ans=0.125 2023-10-09 16:17:29,403 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.83 vs. limit=15.0 2023-10-09 16:17:37,277 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2808456.0, ans=0.1 2023-10-09 16:17:41,823 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.562e+02 3.300e+02 3.832e+02 4.647e+02 9.893e+02, threshold=7.664e+02, percent-clipped=3.0 2023-10-09 16:17:51,786 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.18 vs. limit=15.0 2023-10-09 16:18:02,360 INFO [train.py:1031] (2/4) Epoch 14, batch 17100, loss[loss=0.2348, simple_loss=0.288, pruned_loss=0.06819, ctc_loss=0.1129, over 16841.00 frames. ], tot_loss[loss=0.2306, simple_loss=0.2868, pruned_loss=0.06456, ctc_loss=0.1134, over 3303720.39 frames. ], batch size: 164, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:18:10,145 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.53 vs. limit=12.0 2023-10-09 16:18:24,972 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2808596.0, ans=0.2 2023-10-09 16:18:43,565 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2808689.3333333335, ans=0.04949747468305833 2023-10-09 16:18:55,971 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2808736.0, ans=0.2 2023-10-09 16:19:03,699 INFO [train.py:1031] (2/4) Epoch 14, batch 17150, loss[loss=0.2186, simple_loss=0.2957, pruned_loss=0.05179, ctc_loss=0.09476, over 16920.00 frames. ], tot_loss[loss=0.2289, simple_loss=0.287, pruned_loss=0.06315, ctc_loss=0.111, over 3300312.05 frames. ], batch size: 243, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:19:06,311 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2808782.6666666665, ans=0.1 2023-10-09 16:19:25,505 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=2808829.3333333335, ans=0.5 2023-10-09 16:19:32,964 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2808876.0, ans=0.125 2023-10-09 16:19:35,536 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=10.12 vs. limit=22.5 2023-10-09 16:19:41,705 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2808922.6666666665, ans=0.2 2023-10-09 16:19:42,836 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2808922.6666666665, ans=0.0 2023-10-09 16:19:46,139 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.277e+02 3.091e+02 3.589e+02 4.240e+02 6.885e+02, threshold=7.178e+02, percent-clipped=0.0 2023-10-09 16:20:04,868 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2809016.0, ans=0.125 2023-10-09 16:20:05,575 INFO [train.py:1031] (2/4) Epoch 14, batch 17200, loss[loss=0.2515, simple_loss=0.3258, pruned_loss=0.06365, ctc_loss=0.1249, over 16743.00 frames. ], tot_loss[loss=0.2332, simple_loss=0.2933, pruned_loss=0.06401, ctc_loss=0.113, over 3308141.39 frames. ], batch size: 272, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:20:23,102 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 16:20:24,068 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.79 vs. limit=15.0 2023-10-09 16:20:48,205 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=2809156.0, ans=15.0 2023-10-09 16:20:57,504 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2809156.0, ans=0.035 2023-10-09 16:21:00,306 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2809202.6666666665, ans=0.125 2023-10-09 16:21:12,738 INFO [train.py:1031] (2/4) Epoch 14, batch 17250, loss[loss=0.2988, simple_loss=0.3889, pruned_loss=0.07514, ctc_loss=0.1462, over 16804.00 frames. ], tot_loss[loss=0.2457, simple_loss=0.3103, pruned_loss=0.06664, ctc_loss=0.1198, over 3309962.18 frames. ], batch size: 272, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:21:21,396 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2809249.3333333335, ans=0.2 2023-10-09 16:21:40,954 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2809342.6666666665, ans=0.0 2023-10-09 16:21:46,484 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2809342.6666666665, ans=0.1 2023-10-09 16:21:46,634 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=22.5 2023-10-09 16:21:47,535 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2809342.6666666665, ans=0.05 2023-10-09 16:21:57,826 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.611e+02 3.919e+02 4.626e+02 5.820e+02 9.725e+02, threshold=9.252e+02, percent-clipped=7.0 2023-10-09 16:22:04,785 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2809436.0, ans=0.125 2023-10-09 16:22:05,971 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.26 vs. limit=10.0 2023-10-09 16:22:16,149 INFO [train.py:1031] (2/4) Epoch 14, batch 17300, loss[loss=0.2458, simple_loss=0.3354, pruned_loss=0.05763, ctc_loss=0.1023, over 16845.00 frames. ], tot_loss[loss=0.2512, simple_loss=0.3195, pruned_loss=0.06719, ctc_loss=0.1215, over 3306234.03 frames. ], batch size: 242, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:22:27,829 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2809529.3333333335, ans=0.1 2023-10-09 16:22:28,123 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.42 vs. limit=22.5 2023-10-09 16:22:37,757 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2809529.3333333335, ans=0.125 2023-10-09 16:22:50,680 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2809576.0, ans=10.0 2023-10-09 16:23:16,926 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 16:23:17,641 INFO [train.py:1031] (2/4) Epoch 14, batch 17350, loss[loss=0.2631, simple_loss=0.3394, pruned_loss=0.06915, ctc_loss=0.1216, over 16226.00 frames. ], tot_loss[loss=0.2545, simple_loss=0.3233, pruned_loss=0.06831, ctc_loss=0.1227, over 3305464.77 frames. ], batch size: 463, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:23:24,354 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2809716.0, ans=0.0 2023-10-09 16:23:25,436 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2809716.0, ans=0.125 2023-10-09 16:23:29,919 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2809762.6666666665, ans=0.1 2023-10-09 16:23:34,210 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2809762.6666666665, ans=0.125 2023-10-09 16:23:47,218 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 16:23:50,851 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2809809.3333333335, ans=0.0 2023-10-09 16:23:51,927 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-10-09 16:23:54,963 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.91 vs. limit=15.0 2023-10-09 16:24:01,203 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.478e+02 3.229e+02 3.810e+02 5.005e+02 1.294e+03, threshold=7.619e+02, percent-clipped=1.0 2023-10-09 16:24:18,376 INFO [train.py:1031] (2/4) Epoch 14, batch 17400, loss[loss=0.1902, simple_loss=0.2447, pruned_loss=0.05125, ctc_loss=0.0832, over 16787.00 frames. ], tot_loss[loss=0.2478, simple_loss=0.3125, pruned_loss=0.06745, ctc_loss=0.1204, over 3307884.22 frames. ], batch size: 164, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:24:18,609 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2809949.3333333335, ans=0.0 2023-10-09 16:24:32,566 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=2809996.0, ans=0.05 2023-10-09 16:25:15,495 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2810136.0, ans=0.2 2023-10-09 16:25:16,625 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2810136.0, ans=0.0 2023-10-09 16:25:18,380 INFO [train.py:1031] (2/4) Epoch 14, batch 17450, loss[loss=0.216, simple_loss=0.2567, pruned_loss=0.06532, ctc_loss=0.1117, over 16674.00 frames. ], tot_loss[loss=0.2393, simple_loss=0.2995, pruned_loss=0.06606, ctc_loss=0.1175, over 3309744.35 frames. ], batch size: 291, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:25:28,922 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.23 vs. limit=15.0 2023-10-09 16:25:35,544 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2810229.3333333335, ans=0.05 2023-10-09 16:25:42,292 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2810229.3333333335, ans=0.0 2023-10-09 16:26:04,568 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2810322.6666666665, ans=0.125 2023-10-09 16:26:05,313 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.445e+02 3.049e+02 3.427e+02 3.970e+02 9.337e+02, threshold=6.853e+02, percent-clipped=1.0 2023-10-09 16:26:11,085 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2810369.3333333335, ans=0.125 2023-10-09 16:26:20,080 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2810416.0, ans=0.125 2023-10-09 16:26:20,866 INFO [train.py:1031] (2/4) Epoch 14, batch 17500, loss[loss=0.2358, simple_loss=0.2875, pruned_loss=0.06982, ctc_loss=0.1111, over 16776.00 frames. ], tot_loss[loss=0.2351, simple_loss=0.291, pruned_loss=0.06623, ctc_loss=0.1172, over 3310575.27 frames. ], batch size: 95, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:26:35,123 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2810462.6666666665, ans=0.0 2023-10-09 16:26:40,592 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2810462.6666666665, ans=0.0 2023-10-09 16:26:42,959 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=10.10 vs. limit=22.5 2023-10-09 16:26:45,480 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2810509.3333333335, ans=0.0 2023-10-09 16:26:58,741 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.85 vs. limit=12.0 2023-10-09 16:27:14,259 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=7.92 vs. limit=15.0 2023-10-09 16:27:22,358 INFO [train.py:1031] (2/4) Epoch 14, batch 17550, loss[loss=0.2387, simple_loss=0.2875, pruned_loss=0.0703, ctc_loss=0.1232, over 16854.00 frames. ], tot_loss[loss=0.2374, simple_loss=0.2912, pruned_loss=0.06786, ctc_loss=0.1196, over 3297029.98 frames. ], batch size: 242, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:28:12,372 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.190e+02 3.113e+02 3.532e+02 4.348e+02 7.721e+02, threshold=7.063e+02, percent-clipped=2.0 2023-10-09 16:28:23,190 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2810836.0, ans=0.035 2023-10-09 16:28:25,639 INFO [train.py:1031] (2/4) Epoch 14, batch 17600, loss[loss=0.2023, simple_loss=0.2601, pruned_loss=0.05421, ctc_loss=0.09017, over 16790.00 frames. ], tot_loss[loss=0.2362, simple_loss=0.2932, pruned_loss=0.0662, ctc_loss=0.1171, over 3302469.97 frames. ], batch size: 130, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:28:26,400 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.48 vs. limit=15.0 2023-10-09 16:28:55,189 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.28 vs. limit=15.0 2023-10-09 16:29:07,916 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2811022.6666666665, ans=0.125 2023-10-09 16:29:07,988 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2811022.6666666665, ans=0.125 2023-10-09 16:29:08,764 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2811022.6666666665, ans=0.125 2023-10-09 16:29:26,298 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2811116.0, ans=0.0 2023-10-09 16:29:27,533 INFO [train.py:1031] (2/4) Epoch 14, batch 17650, loss[loss=0.2732, simple_loss=0.3222, pruned_loss=0.08347, ctc_loss=0.1431, over 16721.00 frames. ], tot_loss[loss=0.2333, simple_loss=0.2918, pruned_loss=0.06457, ctc_loss=0.1139, over 3304797.78 frames. ], batch size: 384, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:29:55,206 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2811209.3333333335, ans=0.2 2023-10-09 16:29:57,254 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.26 vs. limit=22.5 2023-10-09 16:30:15,476 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2811256.0, ans=0.125 2023-10-09 16:30:17,962 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.137e+02 2.983e+02 3.277e+02 4.147e+02 6.506e+02, threshold=6.554e+02, percent-clipped=0.0 2023-10-09 16:30:26,915 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2811302.6666666665, ans=0.07 2023-10-09 16:30:31,456 INFO [train.py:1031] (2/4) Epoch 14, batch 17700, loss[loss=0.2954, simple_loss=0.3497, pruned_loss=0.08756, ctc_loss=0.165, over 16384.00 frames. ], tot_loss[loss=0.2299, simple_loss=0.2928, pruned_loss=0.06162, ctc_loss=0.1096, over 3309251.43 frames. ], batch size: 416, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:30:33,488 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=2811349.3333333335, ans=0.2 2023-10-09 16:31:17,389 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2811489.3333333335, ans=0.125 2023-10-09 16:31:35,982 INFO [train.py:1031] (2/4) Epoch 14, batch 17750, loss[loss=0.2502, simple_loss=0.3335, pruned_loss=0.06036, ctc_loss=0.1153, over 16827.00 frames. ], tot_loss[loss=0.2269, simple_loss=0.2915, pruned_loss=0.05974, ctc_loss=0.107, over 3309453.90 frames. ], batch size: 328, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:31:43,549 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.60 vs. limit=10.0 2023-10-09 16:31:51,522 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2811629.3333333335, ans=0.0 2023-10-09 16:32:08,279 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=2811676.0, ans=15.0 2023-10-09 16:32:09,005 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2811676.0, ans=0.125 2023-10-09 16:32:20,018 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2811722.6666666665, ans=0.125 2023-10-09 16:32:26,656 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.329e+02 3.107e+02 3.479e+02 4.054e+02 7.691e+02, threshold=6.958e+02, percent-clipped=4.0 2023-10-09 16:32:39,786 INFO [train.py:1031] (2/4) Epoch 14, batch 17800, loss[loss=0.1698, simple_loss=0.2437, pruned_loss=0.0354, ctc_loss=0.06264, over 16667.00 frames. ], tot_loss[loss=0.2238, simple_loss=0.2911, pruned_loss=0.05748, ctc_loss=0.1037, over 3302974.88 frames. ], batch size: 151, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:33:01,132 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2811862.6666666665, ans=0.0 2023-10-09 16:33:10,710 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=2811909.3333333335, ans=0.1 2023-10-09 16:33:41,468 INFO [train.py:1031] (2/4) Epoch 14, batch 17850, loss[loss=0.1991, simple_loss=0.2519, pruned_loss=0.05427, ctc_loss=0.09458, over 16799.00 frames. ], tot_loss[loss=0.2188, simple_loss=0.2848, pruned_loss=0.05621, ctc_loss=0.1011, over 3307270.96 frames. ], batch size: 176, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:33:42,245 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.07 vs. limit=15.0 2023-10-09 16:33:48,707 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2812049.3333333335, ans=0.0 2023-10-09 16:34:20,646 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2812189.3333333335, ans=0.125 2023-10-09 16:34:32,669 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.134e+02 2.983e+02 3.516e+02 4.147e+02 7.275e+02, threshold=7.033e+02, percent-clipped=1.0 2023-10-09 16:34:35,636 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2812236.0, ans=0.125 2023-10-09 16:34:43,848 INFO [train.py:1031] (2/4) Epoch 14, batch 17900, loss[loss=0.2539, simple_loss=0.2988, pruned_loss=0.07812, ctc_loss=0.1317, over 16726.00 frames. ], tot_loss[loss=0.2176, simple_loss=0.2788, pruned_loss=0.05757, ctc_loss=0.1029, over 3300927.62 frames. ], batch size: 111, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:35:15,015 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2812376.0, ans=0.0 2023-10-09 16:35:43,155 INFO [train.py:1031] (2/4) Epoch 14, batch 17950, loss[loss=0.2655, simple_loss=0.3012, pruned_loss=0.0843, ctc_loss=0.1532, over 16608.00 frames. ], tot_loss[loss=0.22, simple_loss=0.278, pruned_loss=0.05976, ctc_loss=0.1061, over 3309124.04 frames. ], batch size: 350, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:35:46,604 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.83 vs. limit=15.0 2023-10-09 16:36:16,848 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2812609.3333333335, ans=0.125 2023-10-09 16:36:37,264 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.051e+02 3.955e+02 4.561e+02 5.519e+02 1.023e+03, threshold=9.123e+02, percent-clipped=10.0 2023-10-09 16:36:39,542 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.17 vs. limit=22.5 2023-10-09 16:36:47,004 INFO [train.py:1031] (2/4) Epoch 14, batch 18000, loss[loss=0.2604, simple_loss=0.3125, pruned_loss=0.0765, ctc_loss=0.1382, over 16853.00 frames. ], tot_loss[loss=0.2273, simple_loss=0.2828, pruned_loss=0.06344, ctc_loss=0.1123, over 3308955.07 frames. ], batch size: 309, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:36:47,005 INFO [train.py:1054] (2/4) Computing validation loss 2023-10-09 16:37:05,079 INFO [train.py:1063] (2/4) Epoch 14, validation: loss=0.2359, simple_loss=0.3042, pruned_loss=0.06468, ctc_loss=0.09589, over 1796401.00 frames. 2023-10-09 16:37:05,079 INFO [train.py:1064] (2/4) Maximum memory allocated so far is 14591MB 2023-10-09 16:37:06,612 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2812749.3333333335, ans=0.0 2023-10-09 16:37:31,629 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2812842.6666666665, ans=0.125 2023-10-09 16:37:34,327 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.41 vs. limit=15.0 2023-10-09 16:37:42,292 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2812842.6666666665, ans=0.125 2023-10-09 16:38:10,373 INFO [train.py:1031] (2/4) Epoch 14, batch 18050, loss[loss=0.259, simple_loss=0.3279, pruned_loss=0.06799, ctc_loss=0.135, over 16655.00 frames. ], tot_loss[loss=0.2327, simple_loss=0.2889, pruned_loss=0.06515, ctc_loss=0.1154, over 3306474.33 frames. ], batch size: 351, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:38:37,691 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2813076.0, ans=0.1 2023-10-09 16:38:54,580 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.13 vs. limit=15.0 2023-10-09 16:39:06,174 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.695e+02 3.447e+02 3.987e+02 5.015e+02 1.069e+03, threshold=7.973e+02, percent-clipped=1.0 2023-10-09 16:39:14,516 INFO [train.py:1031] (2/4) Epoch 14, batch 18100, loss[loss=0.2346, simple_loss=0.3419, pruned_loss=0.04647, ctc_loss=0.08572, over 15155.00 frames. ], tot_loss[loss=0.2313, simple_loss=0.2915, pruned_loss=0.06322, ctc_loss=0.1119, over 3292881.83 frames. ], batch size: 526, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:39:22,503 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2813216.0, ans=0.125 2023-10-09 16:39:29,045 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.49 vs. limit=12.0 2023-10-09 16:39:50,905 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.04 vs. limit=15.0 2023-10-09 16:40:03,451 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2813402.6666666665, ans=0.125 2023-10-09 16:40:16,838 INFO [train.py:1031] (2/4) Epoch 14, batch 18150, loss[loss=0.2131, simple_loss=0.2641, pruned_loss=0.05905, ctc_loss=0.1102, over 16779.00 frames. ], tot_loss[loss=0.2283, simple_loss=0.2879, pruned_loss=0.0623, ctc_loss=0.1101, over 3288954.23 frames. ], batch size: 292, lr: 2.56e-03, grad_scale: 1.0 2023-10-09 16:40:29,574 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2813496.0, ans=0.1 2023-10-09 16:40:47,927 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 16:40:56,517 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2813589.3333333335, ans=0.125 2023-10-09 16:41:08,817 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2813636.0, ans=0.125 2023-10-09 16:41:12,479 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.456e+02 3.202e+02 3.701e+02 4.396e+02 8.361e+02, threshold=7.403e+02, percent-clipped=2.0 2023-10-09 16:41:17,259 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2813636.0, ans=0.1 2023-10-09 16:41:19,064 INFO [train.py:1031] (2/4) Epoch 14, batch 18200, loss[loss=0.1889, simple_loss=0.2485, pruned_loss=0.04738, ctc_loss=0.08653, over 16784.00 frames. ], tot_loss[loss=0.2253, simple_loss=0.2827, pruned_loss=0.06211, ctc_loss=0.1093, over 3290794.21 frames. ], batch size: 202, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:41:24,451 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2813682.6666666665, ans=0.125 2023-10-09 16:41:42,214 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.53 vs. limit=22.5 2023-10-09 16:42:19,172 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2813869.3333333335, ans=0.0 2023-10-09 16:42:21,166 INFO [train.py:1031] (2/4) Epoch 14, batch 18250, loss[loss=0.1733, simple_loss=0.2406, pruned_loss=0.03865, ctc_loss=0.07184, over 16825.00 frames. ], tot_loss[loss=0.215, simple_loss=0.274, pruned_loss=0.05765, ctc_loss=0.1018, over 3289065.39 frames. ], batch size: 228, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:42:22,571 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2813916.0, ans=0.125 2023-10-09 16:42:27,964 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2813916.0, ans=0.125 2023-10-09 16:42:37,672 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2813962.6666666665, ans=0.125 2023-10-09 16:42:53,448 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2814009.3333333335, ans=0.1 2023-10-09 16:43:15,002 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2814102.6666666665, ans=0.2 2023-10-09 16:43:16,732 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+02 2.801e+02 3.276e+02 4.033e+02 6.396e+02, threshold=6.552e+02, percent-clipped=0.0 2023-10-09 16:43:17,030 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2814102.6666666665, ans=0.2 2023-10-09 16:43:22,462 INFO [train.py:1031] (2/4) Epoch 14, batch 18300, loss[loss=0.1841, simple_loss=0.257, pruned_loss=0.04104, ctc_loss=0.07263, over 16859.00 frames. ], tot_loss[loss=0.2074, simple_loss=0.269, pruned_loss=0.05381, ctc_loss=0.09528, over 3288690.73 frames. ], batch size: 202, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:43:25,593 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2814149.3333333335, ans=0.125 2023-10-09 16:43:39,055 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2814196.0, ans=0.1 2023-10-09 16:43:52,446 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2814242.6666666665, ans=0.1 2023-10-09 16:44:01,695 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2814289.3333333335, ans=0.125 2023-10-09 16:44:04,453 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2814289.3333333335, ans=0.0 2023-10-09 16:44:25,028 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2814382.6666666665, ans=0.1 2023-10-09 16:44:25,852 INFO [train.py:1031] (2/4) Epoch 14, batch 18350, loss[loss=0.2493, simple_loss=0.3466, pruned_loss=0.05529, ctc_loss=0.1036, over 16315.00 frames. ], tot_loss[loss=0.2101, simple_loss=0.274, pruned_loss=0.05386, ctc_loss=0.09601, over 3277288.75 frames. ], batch size: 463, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:44:30,172 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.71 vs. limit=10.0 2023-10-09 16:44:35,170 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.10 vs. limit=15.0 2023-10-09 16:44:38,720 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2814429.3333333335, ans=0.125 2023-10-09 16:44:47,784 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2814429.3333333335, ans=0.125 2023-10-09 16:44:57,821 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.76 vs. limit=22.5 2023-10-09 16:45:08,197 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2814522.6666666665, ans=0.125 2023-10-09 16:45:22,232 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.132e+02 3.059e+02 3.585e+02 4.224e+02 7.359e+02, threshold=7.170e+02, percent-clipped=2.0 2023-10-09 16:45:24,612 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2814569.3333333335, ans=0.125 2023-10-09 16:45:26,278 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2814616.0, ans=0.2 2023-10-09 16:45:26,918 INFO [train.py:1031] (2/4) Epoch 14, batch 18400, loss[loss=0.2597, simple_loss=0.3114, pruned_loss=0.07599, ctc_loss=0.1402, over 16893.00 frames. ], tot_loss[loss=0.2169, simple_loss=0.2807, pruned_loss=0.05644, ctc_loss=0.1006, over 3278542.55 frames. ], batch size: 292, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:45:35,164 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2814616.0, ans=0.0 2023-10-09 16:45:48,577 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2814662.6666666665, ans=0.0 2023-10-09 16:45:54,708 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2814709.3333333335, ans=0.125 2023-10-09 16:46:06,142 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.27 vs. limit=10.0 2023-10-09 16:46:23,467 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2814802.6666666665, ans=0.0 2023-10-09 16:46:27,823 INFO [train.py:1031] (2/4) Epoch 14, batch 18450, loss[loss=0.2135, simple_loss=0.2725, pruned_loss=0.05724, ctc_loss=0.1003, over 16958.00 frames. ], tot_loss[loss=0.2222, simple_loss=0.2824, pruned_loss=0.05982, ctc_loss=0.1058, over 3286474.79 frames. ], batch size: 215, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:46:29,591 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.65 vs. limit=15.0 2023-10-09 16:46:35,659 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2814849.3333333335, ans=0.0 2023-10-09 16:46:53,424 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.15 vs. limit=10.0 2023-10-09 16:46:56,859 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.35 vs. limit=15.0 2023-10-09 16:46:58,793 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2814942.6666666665, ans=0.125 2023-10-09 16:47:09,079 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.58 vs. limit=15.0 2023-10-09 16:47:10,599 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2814989.3333333335, ans=0.2 2023-10-09 16:47:17,995 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2815036.0, ans=0.0 2023-10-09 16:47:26,503 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.678e+02 3.308e+02 3.613e+02 4.264e+02 6.985e+02, threshold=7.226e+02, percent-clipped=0.0 2023-10-09 16:47:30,875 INFO [train.py:1031] (2/4) Epoch 14, batch 18500, loss[loss=0.2708, simple_loss=0.3151, pruned_loss=0.08435, ctc_loss=0.1444, over 16503.00 frames. ], tot_loss[loss=0.2252, simple_loss=0.2833, pruned_loss=0.06171, ctc_loss=0.109, over 3300625.71 frames. ], batch size: 416, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:48:00,317 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2815176.0, ans=0.015 2023-10-09 16:48:01,718 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.05 vs. limit=15.0 2023-10-09 16:48:03,186 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2815176.0, ans=0.025 2023-10-09 16:48:05,714 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.06 vs. limit=15.0 2023-10-09 16:48:13,220 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2815222.6666666665, ans=0.0 2023-10-09 16:48:24,895 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2815269.3333333335, ans=0.125 2023-10-09 16:48:32,664 INFO [train.py:1031] (2/4) Epoch 14, batch 18550, loss[loss=0.2486, simple_loss=0.2786, pruned_loss=0.07952, ctc_loss=0.1492, over 15475.00 frames. ], tot_loss[loss=0.2315, simple_loss=0.2876, pruned_loss=0.06484, ctc_loss=0.1141, over 3292251.33 frames. ], batch size: 530, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:48:47,550 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2815362.6666666665, ans=0.1 2023-10-09 16:48:56,864 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2815409.3333333335, ans=0.0 2023-10-09 16:48:57,055 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.77 vs. limit=12.0 2023-10-09 16:49:03,969 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.73 vs. limit=12.0 2023-10-09 16:49:06,791 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2815409.3333333335, ans=0.125 2023-10-09 16:49:21,433 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2815456.0, ans=0.125 2023-10-09 16:49:22,423 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2815502.6666666665, ans=0.125 2023-10-09 16:49:25,404 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2815502.6666666665, ans=0.0 2023-10-09 16:49:34,435 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.785e+02 3.368e+02 3.936e+02 4.731e+02 1.128e+03, threshold=7.872e+02, percent-clipped=2.0 2023-10-09 16:49:36,562 INFO [train.py:1031] (2/4) Epoch 14, batch 18600, loss[loss=0.2457, simple_loss=0.3359, pruned_loss=0.05633, ctc_loss=0.1068, over 16749.00 frames. ], tot_loss[loss=0.2412, simple_loss=0.2991, pruned_loss=0.06776, ctc_loss=0.1196, over 3300807.72 frames. ], batch size: 188, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:49:49,418 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2815596.0, ans=0.125 2023-10-09 16:49:57,216 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2815596.0, ans=0.0 2023-10-09 16:50:00,505 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2815596.0, ans=0.0 2023-10-09 16:50:15,224 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2815689.3333333335, ans=0.025 2023-10-09 16:50:41,207 INFO [train.py:1031] (2/4) Epoch 14, batch 18650, loss[loss=0.2696, simple_loss=0.3211, pruned_loss=0.08094, ctc_loss=0.1405, over 16920.00 frames. ], tot_loss[loss=0.2467, simple_loss=0.305, pruned_loss=0.0697, ctc_loss=0.1227, over 3295429.97 frames. ], batch size: 258, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:50:48,701 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2815782.6666666665, ans=0.125 2023-10-09 16:50:49,731 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 16:50:56,853 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2815829.3333333335, ans=0.125 2023-10-09 16:51:14,772 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2815876.0, ans=0.0 2023-10-09 16:51:25,315 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2815922.6666666665, ans=0.0 2023-10-09 16:51:25,320 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2815922.6666666665, ans=0.125 2023-10-09 16:51:25,723 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.60 vs. limit=10.0 2023-10-09 16:51:28,020 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2815922.6666666665, ans=0.0 2023-10-09 16:51:36,829 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2815969.3333333335, ans=0.0 2023-10-09 16:51:39,548 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2023-10-09 16:51:41,576 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.792e+02 3.349e+02 3.828e+02 4.485e+02 8.259e+02, threshold=7.655e+02, percent-clipped=2.0 2023-10-09 16:51:43,744 INFO [train.py:1031] (2/4) Epoch 14, batch 18700, loss[loss=0.2414, simple_loss=0.3328, pruned_loss=0.05346, ctc_loss=0.1074, over 15179.00 frames. ], tot_loss[loss=0.2479, simple_loss=0.3056, pruned_loss=0.07036, ctc_loss=0.124, over 3295710.50 frames. ], batch size: 527, lr: 2.56e-03, grad_scale: 8.0 2023-10-09 16:52:42,308 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2816202.6666666665, ans=0.1 2023-10-09 16:52:46,840 INFO [train.py:1031] (2/4) Epoch 14, batch 18750, loss[loss=0.202, simple_loss=0.2727, pruned_loss=0.04904, ctc_loss=0.08292, over 16657.00 frames. ], tot_loss[loss=0.2419, simple_loss=0.3032, pruned_loss=0.0667, ctc_loss=0.1181, over 3294327.99 frames. ], batch size: 140, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:52:55,463 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.64 vs. limit=12.0 2023-10-09 16:52:56,377 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=11.73 vs. limit=22.5 2023-10-09 16:53:03,530 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2816296.0, ans=0.125 2023-10-09 16:53:09,819 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2816342.6666666665, ans=0.0 2023-10-09 16:53:21,228 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2816342.6666666665, ans=0.1 2023-10-09 16:53:48,759 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.190e+02 2.936e+02 3.595e+02 4.298e+02 1.016e+03, threshold=7.191e+02, percent-clipped=2.0 2023-10-09 16:53:48,785 INFO [train.py:1031] (2/4) Epoch 14, batch 18800, loss[loss=0.1915, simple_loss=0.2443, pruned_loss=0.05166, ctc_loss=0.08834, over 16788.00 frames. ], tot_loss[loss=0.2328, simple_loss=0.2955, pruned_loss=0.06281, ctc_loss=0.1114, over 3297991.78 frames. ], batch size: 95, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:54:12,475 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.15 vs. limit=22.5 2023-10-09 16:54:23,955 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.48 vs. limit=15.0 2023-10-09 16:54:26,568 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2816622.6666666665, ans=0.0 2023-10-09 16:54:37,740 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2816669.3333333335, ans=0.025 2023-10-09 16:54:48,925 INFO [train.py:1031] (2/4) Epoch 14, batch 18850, loss[loss=0.2271, simple_loss=0.2729, pruned_loss=0.06705, ctc_loss=0.118, over 16955.00 frames. ], tot_loss[loss=0.2297, simple_loss=0.2903, pruned_loss=0.06238, ctc_loss=0.1106, over 3299499.46 frames. ], batch size: 228, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:54:59,142 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2816716.0, ans=0.0 2023-10-09 16:54:59,173 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2816716.0, ans=0.125 2023-10-09 16:55:19,971 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2816809.3333333335, ans=0.0 2023-10-09 16:55:37,358 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2023-10-09 16:55:40,317 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2816902.6666666665, ans=0.05 2023-10-09 16:55:49,882 INFO [train.py:1031] (2/4) Epoch 14, batch 18900, loss[loss=0.3078, simple_loss=0.3407, pruned_loss=0.1019, ctc_loss=0.1777, over 16720.00 frames. ], tot_loss[loss=0.2329, simple_loss=0.2907, pruned_loss=0.0647, ctc_loss=0.1141, over 3296326.23 frames. ], batch size: 384, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:55:53,227 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.368e+02 3.158e+02 3.575e+02 4.091e+02 5.831e+02, threshold=7.150e+02, percent-clipped=0.0 2023-10-09 16:55:59,238 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.63 vs. limit=22.5 2023-10-09 16:56:11,093 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2816996.0, ans=0.125 2023-10-09 16:56:11,136 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2816996.0, ans=0.125 2023-10-09 16:56:13,581 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=9.23 vs. limit=22.5 2023-10-09 16:56:17,243 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2817042.6666666665, ans=0.125 2023-10-09 16:56:45,825 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2817136.0, ans=0.1 2023-10-09 16:56:54,183 INFO [train.py:1031] (2/4) Epoch 14, batch 18950, loss[loss=0.2029, simple_loss=0.2609, pruned_loss=0.05368, ctc_loss=0.09386, over 16818.00 frames. ], tot_loss[loss=0.2351, simple_loss=0.2924, pruned_loss=0.06569, ctc_loss=0.1158, over 3293928.90 frames. ], batch size: 121, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:56:56,629 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2817182.6666666665, ans=0.0 2023-10-09 16:57:11,156 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 16:57:32,877 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2817322.6666666665, ans=0.0 2023-10-09 16:57:42,401 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.22 vs. limit=12.0 2023-10-09 16:57:43,614 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2817369.3333333335, ans=0.125 2023-10-09 16:57:44,587 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2817369.3333333335, ans=0.0 2023-10-09 16:57:52,468 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2817369.3333333335, ans=0.2 2023-10-09 16:57:55,557 INFO [train.py:1031] (2/4) Epoch 14, batch 19000, loss[loss=0.2238, simple_loss=0.2826, pruned_loss=0.06111, ctc_loss=0.1072, over 16903.00 frames. ], tot_loss[loss=0.2319, simple_loss=0.2912, pruned_loss=0.06375, ctc_loss=0.1125, over 3292874.24 frames. ], batch size: 228, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 16:57:58,302 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.172e+02 3.278e+02 3.626e+02 4.352e+02 8.941e+02, threshold=7.252e+02, percent-clipped=2.0 2023-10-09 16:58:01,913 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.42 vs. limit=15.0 2023-10-09 16:58:30,047 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2817509.3333333335, ans=0.0 2023-10-09 16:58:38,745 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2817556.0, ans=0.0 2023-10-09 16:58:57,276 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2817649.3333333335, ans=0.1 2023-10-09 16:58:57,889 INFO [train.py:1031] (2/4) Epoch 14, batch 19050, loss[loss=0.2113, simple_loss=0.2476, pruned_loss=0.06424, ctc_loss=0.1164, over 15429.00 frames. ], tot_loss[loss=0.231, simple_loss=0.289, pruned_loss=0.06401, ctc_loss=0.1125, over 3294985.93 frames. ], batch size: 529, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 16:59:10,757 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2817696.0, ans=0.125 2023-10-09 16:59:10,954 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.08 vs. limit=22.5 2023-10-09 16:59:51,872 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.64 vs. limit=22.5 2023-10-09 17:00:00,787 INFO [train.py:1031] (2/4) Epoch 14, batch 19100, loss[loss=0.2434, simple_loss=0.2885, pruned_loss=0.07285, ctc_loss=0.1316, over 16252.00 frames. ], tot_loss[loss=0.2356, simple_loss=0.2914, pruned_loss=0.06652, ctc_loss=0.1168, over 3294015.73 frames. ], batch size: 463, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 17:00:04,649 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.758e+02 3.450e+02 4.008e+02 4.699e+02 1.096e+03, threshold=8.015e+02, percent-clipped=2.0 2023-10-09 17:00:20,013 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.25 vs. limit=15.0 2023-10-09 17:00:37,169 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2818022.6666666665, ans=0.125 2023-10-09 17:00:55,009 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2818069.3333333335, ans=0.0 2023-10-09 17:01:02,244 INFO [train.py:1031] (2/4) Epoch 14, batch 19150, loss[loss=0.1933, simple_loss=0.2791, pruned_loss=0.03863, ctc_loss=0.07563, over 16843.00 frames. ], tot_loss[loss=0.2336, simple_loss=0.2915, pruned_loss=0.0649, ctc_loss=0.1146, over 3295990.69 frames. ], batch size: 228, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 17:01:15,929 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2818162.6666666665, ans=0.125 2023-10-09 17:01:24,034 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2818162.6666666665, ans=0.125 2023-10-09 17:01:56,357 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2818302.6666666665, ans=0.05 2023-10-09 17:01:58,496 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2818302.6666666665, ans=0.0 2023-10-09 17:02:06,623 INFO [train.py:1031] (2/4) Epoch 14, batch 19200, loss[loss=0.1777, simple_loss=0.2388, pruned_loss=0.04287, ctc_loss=0.07708, over 16866.00 frames. ], tot_loss[loss=0.2289, simple_loss=0.2896, pruned_loss=0.06208, ctc_loss=0.1103, over 3295470.36 frames. ], batch size: 141, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 17:02:12,384 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.374e+02 3.095e+02 3.707e+02 4.645e+02 1.379e+03, threshold=7.414e+02, percent-clipped=4.0 2023-10-09 17:02:13,704 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2818349.3333333335, ans=0.2 2023-10-09 17:02:26,950 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2818396.0, ans=0.125 2023-10-09 17:02:39,564 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2818442.6666666665, ans=0.125 2023-10-09 17:03:02,572 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2818536.0, ans=0.125 2023-10-09 17:03:09,812 INFO [train.py:1031] (2/4) Epoch 14, batch 19250, loss[loss=0.2492, simple_loss=0.3284, pruned_loss=0.06132, ctc_loss=0.1184, over 15137.00 frames. ], tot_loss[loss=0.2283, simple_loss=0.2901, pruned_loss=0.06131, ctc_loss=0.1098, over 3286440.18 frames. ], batch size: 527, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 17:03:23,780 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.42 vs. limit=15.0 2023-10-09 17:03:38,298 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2818676.0, ans=0.125 2023-10-09 17:03:46,900 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2818676.0, ans=0.125 2023-10-09 17:03:54,904 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=2818722.6666666665, ans=22.5 2023-10-09 17:04:08,790 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2818769.3333333335, ans=0.0 2023-10-09 17:04:15,646 INFO [train.py:1031] (2/4) Epoch 14, batch 19300, loss[loss=0.2699, simple_loss=0.3161, pruned_loss=0.08185, ctc_loss=0.1499, over 16696.00 frames. ], tot_loss[loss=0.2295, simple_loss=0.2901, pruned_loss=0.06223, ctc_loss=0.1112, over 3290356.60 frames. ], batch size: 272, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 17:04:24,136 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.235e+02 3.286e+02 3.972e+02 4.950e+02 6.905e+02, threshold=7.944e+02, percent-clipped=0.0 2023-10-09 17:04:24,580 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2818816.0, ans=0.125 2023-10-09 17:04:25,774 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2818816.0, ans=0.0 2023-10-09 17:04:25,800 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2818816.0, ans=0.125 2023-10-09 17:04:28,555 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2818862.6666666665, ans=0.04949747468305833 2023-10-09 17:04:40,955 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2818909.3333333335, ans=0.2 2023-10-09 17:04:53,803 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.29 vs. limit=10.0 2023-10-09 17:04:56,757 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2818956.0, ans=0.0 2023-10-09 17:05:18,463 INFO [train.py:1031] (2/4) Epoch 14, batch 19350, loss[loss=0.2346, simple_loss=0.2585, pruned_loss=0.08007, ctc_loss=0.1266, over 11733.00 frames. ], tot_loss[loss=0.2309, simple_loss=0.2907, pruned_loss=0.06307, ctc_loss=0.1124, over 3288095.94 frames. ], batch size: 39, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 17:05:18,837 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2819049.3333333335, ans=0.125 2023-10-09 17:05:21,588 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2819049.3333333335, ans=0.125 2023-10-09 17:05:44,999 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.39 vs. limit=15.0 2023-10-09 17:05:45,683 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2819142.6666666665, ans=0.125 2023-10-09 17:05:51,637 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2819142.6666666665, ans=0.125 2023-10-09 17:06:02,193 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2819189.3333333335, ans=0.0 2023-10-09 17:06:13,261 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=2819236.0, ans=0.95 2023-10-09 17:06:18,198 INFO [train.py:1031] (2/4) Epoch 14, batch 19400, loss[loss=0.1975, simple_loss=0.253, pruned_loss=0.05405, ctc_loss=0.08476, over 16868.00 frames. ], tot_loss[loss=0.2229, simple_loss=0.2837, pruned_loss=0.05976, ctc_loss=0.1065, over 3287638.71 frames. ], batch size: 82, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 17:06:19,523 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2819282.6666666665, ans=0.125 2023-10-09 17:06:25,712 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.212e+02 2.971e+02 3.609e+02 4.450e+02 6.456e+02, threshold=7.218e+02, percent-clipped=0.0 2023-10-09 17:06:49,039 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2819376.0, ans=0.0 2023-10-09 17:06:52,816 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2819376.0, ans=0.1 2023-10-09 17:06:54,846 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2819422.6666666665, ans=0.0 2023-10-09 17:06:56,010 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2819422.6666666665, ans=0.125 2023-10-09 17:07:18,507 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2819516.0, ans=0.1 2023-10-09 17:07:19,286 INFO [train.py:1031] (2/4) Epoch 14, batch 19450, loss[loss=0.2014, simple_loss=0.2687, pruned_loss=0.04942, ctc_loss=0.08805, over 16859.00 frames. ], tot_loss[loss=0.225, simple_loss=0.2836, pruned_loss=0.06143, ctc_loss=0.1089, over 3298891.80 frames. ], batch size: 189, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 17:07:20,060 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.53 vs. limit=6.0 2023-10-09 17:07:23,232 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2819516.0, ans=0.0 2023-10-09 17:07:28,638 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2819516.0, ans=0.0 2023-10-09 17:07:50,042 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2819609.3333333335, ans=0.125 2023-10-09 17:08:09,512 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2819702.6666666665, ans=0.0 2023-10-09 17:08:12,972 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2819702.6666666665, ans=0.1 2023-10-09 17:08:19,242 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2819702.6666666665, ans=0.1 2023-10-09 17:08:21,518 INFO [train.py:1031] (2/4) Epoch 14, batch 19500, loss[loss=0.2171, simple_loss=0.2747, pruned_loss=0.05936, ctc_loss=0.1018, over 10994.00 frames. ], tot_loss[loss=0.2244, simple_loss=0.2849, pruned_loss=0.06044, ctc_loss=0.1073, over 3287636.00 frames. ], batch size: 35, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 17:08:26,727 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2819749.3333333335, ans=0.125 2023-10-09 17:08:26,830 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2819749.3333333335, ans=0.0 2023-10-09 17:08:30,595 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2819749.3333333335, ans=0.125 2023-10-09 17:08:31,250 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.204e+02 3.015e+02 3.593e+02 4.173e+02 8.054e+02, threshold=7.186e+02, percent-clipped=2.0 2023-10-09 17:08:45,635 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2819842.6666666665, ans=0.0 2023-10-09 17:08:49,961 INFO [scaling.py:979] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.94 vs. limit=8.0 2023-10-09 17:08:54,638 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.71 vs. limit=15.0 2023-10-09 17:08:55,509 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2819842.6666666665, ans=0.1 2023-10-09 17:08:58,845 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2819889.3333333335, ans=0.125 2023-10-09 17:09:06,331 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2819889.3333333335, ans=0.125 2023-10-09 17:09:09,510 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2819936.0, ans=0.0 2023-10-09 17:09:21,218 INFO [train.py:1031] (2/4) Epoch 14, batch 19550, loss[loss=0.2979, simple_loss=0.3257, pruned_loss=0.1018, ctc_loss=0.166, over 16516.00 frames. ], tot_loss[loss=0.2306, simple_loss=0.2889, pruned_loss=0.06361, ctc_loss=0.1128, over 3295243.53 frames. ], batch size: 416, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 17:09:30,478 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2819982.6666666665, ans=0.2 2023-10-09 17:09:39,531 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2820029.3333333335, ans=0.1 2023-10-09 17:09:44,019 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2820029.3333333335, ans=0.04949747468305833 2023-10-09 17:09:45,048 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2820076.0, ans=0.2 2023-10-09 17:09:52,957 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.42 vs. limit=15.0 2023-10-09 17:09:59,351 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2820122.6666666665, ans=0.125 2023-10-09 17:10:04,756 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2820122.6666666665, ans=0.0 2023-10-09 17:10:24,798 INFO [train.py:1031] (2/4) Epoch 14, batch 19600, loss[loss=0.2274, simple_loss=0.2935, pruned_loss=0.05982, ctc_loss=0.1039, over 16419.00 frames. ], tot_loss[loss=0.2278, simple_loss=0.2853, pruned_loss=0.06276, ctc_loss=0.1118, over 3301622.52 frames. ], batch size: 416, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 17:10:25,156 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2820216.0, ans=0.125 2023-10-09 17:10:35,229 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.275e+02 3.075e+02 3.430e+02 4.007e+02 6.363e+02, threshold=6.860e+02, percent-clipped=0.0 2023-10-09 17:10:41,097 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.22 vs. limit=15.0 2023-10-09 17:10:45,388 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 17:10:46,970 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2820262.6666666665, ans=0.1 2023-10-09 17:11:04,529 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2820356.0, ans=0.125 2023-10-09 17:11:20,129 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.08 vs. limit=12.0 2023-10-09 17:11:28,190 INFO [train.py:1031] (2/4) Epoch 14, batch 19650, loss[loss=0.2311, simple_loss=0.2846, pruned_loss=0.06617, ctc_loss=0.1133, over 16909.00 frames. ], tot_loss[loss=0.2299, simple_loss=0.2867, pruned_loss=0.06389, ctc_loss=0.1136, over 3306662.96 frames. ], batch size: 215, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 17:11:28,549 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2820449.3333333335, ans=0.125 2023-10-09 17:11:42,074 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.43 vs. limit=15.0 2023-10-09 17:11:56,956 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2820542.6666666665, ans=0.125 2023-10-09 17:12:01,125 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2820542.6666666665, ans=0.125 2023-10-09 17:12:22,996 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2820636.0, ans=0.1 2023-10-09 17:12:23,001 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2820636.0, ans=0.0 2023-10-09 17:12:30,816 INFO [train.py:1031] (2/4) Epoch 14, batch 19700, loss[loss=0.2226, simple_loss=0.2651, pruned_loss=0.06707, ctc_loss=0.1151, over 16226.00 frames. ], tot_loss[loss=0.2314, simple_loss=0.2861, pruned_loss=0.06522, ctc_loss=0.1155, over 3312197.69 frames. ], batch size: 463, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 17:12:32,625 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.51 vs. limit=12.0 2023-10-09 17:12:42,623 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.670e+02 3.372e+02 3.843e+02 4.494e+02 9.285e+02, threshold=7.687e+02, percent-clipped=3.0 2023-10-09 17:12:46,801 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2820729.3333333335, ans=0.0 2023-10-09 17:12:51,021 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2820729.3333333335, ans=0.0 2023-10-09 17:13:21,564 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2820869.3333333335, ans=0.0 2023-10-09 17:13:31,687 INFO [train.py:1031] (2/4) Epoch 14, batch 19750, loss[loss=0.2396, simple_loss=0.3417, pruned_loss=0.04849, ctc_loss=0.1016, over 16245.00 frames. ], tot_loss[loss=0.2303, simple_loss=0.2874, pruned_loss=0.06392, ctc_loss=0.1132, over 3316434.51 frames. ], batch size: 463, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 17:13:57,009 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2821009.3333333335, ans=0.0 2023-10-09 17:14:01,762 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2821009.3333333335, ans=0.0 2023-10-09 17:14:11,268 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2821056.0, ans=0.1 2023-10-09 17:14:34,793 INFO [train.py:1031] (2/4) Epoch 14, batch 19800, loss[loss=0.2402, simple_loss=0.2906, pruned_loss=0.07208, ctc_loss=0.1141, over 16606.00 frames. ], tot_loss[loss=0.2323, simple_loss=0.2908, pruned_loss=0.06412, ctc_loss=0.1137, over 3314416.96 frames. ], batch size: 110, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 17:14:43,890 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2821149.3333333335, ans=0.0 2023-10-09 17:14:47,455 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.293e+02 3.299e+02 3.756e+02 4.593e+02 7.524e+02, threshold=7.512e+02, percent-clipped=0.0 2023-10-09 17:14:53,217 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2821196.0, ans=0.1 2023-10-09 17:15:02,260 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.88 vs. limit=15.0 2023-10-09 17:15:06,653 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2821242.6666666665, ans=0.125 2023-10-09 17:15:10,662 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2821242.6666666665, ans=0.125 2023-10-09 17:15:13,549 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2821289.3333333335, ans=0.0 2023-10-09 17:15:20,908 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2821289.3333333335, ans=0.125 2023-10-09 17:15:24,985 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2821336.0, ans=0.2 2023-10-09 17:15:37,183 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2821336.0, ans=0.125 2023-10-09 17:15:38,927 INFO [train.py:1031] (2/4) Epoch 14, batch 19850, loss[loss=0.2291, simple_loss=0.2892, pruned_loss=0.06277, ctc_loss=0.1085, over 16988.00 frames. ], tot_loss[loss=0.2382, simple_loss=0.2947, pruned_loss=0.0671, ctc_loss=0.1184, over 3312969.97 frames. ], batch size: 216, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 17:15:47,201 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.50 vs. limit=15.0 2023-10-09 17:16:11,721 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.25 vs. limit=15.0 2023-10-09 17:16:22,557 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2821522.6666666665, ans=0.1 2023-10-09 17:16:23,620 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2821522.6666666665, ans=0.1 2023-10-09 17:16:32,468 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.04 vs. limit=15.0 2023-10-09 17:16:39,918 INFO [train.py:1031] (2/4) Epoch 14, batch 19900, loss[loss=0.2443, simple_loss=0.2981, pruned_loss=0.06972, ctc_loss=0.1276, over 16304.00 frames. ], tot_loss[loss=0.2403, simple_loss=0.2961, pruned_loss=0.06824, ctc_loss=0.1201, over 3312102.36 frames. ], batch size: 463, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 17:16:49,400 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2821616.0, ans=0.2 2023-10-09 17:16:50,477 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2821616.0, ans=0.125 2023-10-09 17:16:54,365 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.744e+02 3.692e+02 4.204e+02 4.980e+02 8.655e+02, threshold=8.408e+02, percent-clipped=2.0 2023-10-09 17:17:03,411 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2821709.3333333335, ans=0.125 2023-10-09 17:17:07,644 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2821709.3333333335, ans=0.0 2023-10-09 17:17:11,971 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2821709.3333333335, ans=0.125 2023-10-09 17:17:20,872 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.02 vs. limit=22.5 2023-10-09 17:17:41,865 INFO [train.py:1031] (2/4) Epoch 14, batch 19950, loss[loss=0.2655, simple_loss=0.3081, pruned_loss=0.08366, ctc_loss=0.1386, over 16776.00 frames. ], tot_loss[loss=0.2424, simple_loss=0.296, pruned_loss=0.06988, ctc_loss=0.1226, over 3311006.70 frames. ], batch size: 130, lr: 2.56e-03, grad_scale: 2.0 2023-10-09 17:17:52,005 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2821849.3333333335, ans=0.125 2023-10-09 17:18:16,788 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2821942.6666666665, ans=0.5 2023-10-09 17:18:42,975 INFO [train.py:1031] (2/4) Epoch 14, batch 20000, loss[loss=0.2411, simple_loss=0.2941, pruned_loss=0.06929, ctc_loss=0.124, over 16792.00 frames. ], tot_loss[loss=0.2462, simple_loss=0.2982, pruned_loss=0.07198, ctc_loss=0.1257, over 3308100.67 frames. ], batch size: 309, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 17:18:54,730 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2822129.3333333335, ans=0.0 2023-10-09 17:18:57,871 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.91 vs. limit=15.0 2023-10-09 17:18:58,233 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.512e+02 3.390e+02 3.727e+02 4.517e+02 8.491e+02, threshold=7.454e+02, percent-clipped=1.0 2023-10-09 17:19:12,588 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2822176.0, ans=0.07 2023-10-09 17:19:15,449 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2822176.0, ans=0.5 2023-10-09 17:19:32,397 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.27 vs. limit=15.0 2023-10-09 17:19:33,636 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2822269.3333333335, ans=0.0 2023-10-09 17:19:41,945 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2822269.3333333335, ans=0.1 2023-10-09 17:19:46,371 INFO [train.py:1031] (2/4) Epoch 14, batch 20050, loss[loss=0.1735, simple_loss=0.2262, pruned_loss=0.04539, ctc_loss=0.07514, over 16798.00 frames. ], tot_loss[loss=0.238, simple_loss=0.2894, pruned_loss=0.06918, ctc_loss=0.1204, over 3298279.95 frames. ], batch size: 188, lr: 2.56e-03, grad_scale: 4.0 2023-10-09 17:19:53,714 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.54 vs. limit=15.0 2023-10-09 17:20:15,112 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=2822409.3333333335, ans=15.0 2023-10-09 17:20:42,539 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2822502.6666666665, ans=0.1 2023-10-09 17:20:43,675 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2822502.6666666665, ans=0.125 2023-10-09 17:20:50,097 INFO [train.py:1031] (2/4) Epoch 14, batch 20100, loss[loss=0.2222, simple_loss=0.2615, pruned_loss=0.06879, ctc_loss=0.1134, over 11316.00 frames. ], tot_loss[loss=0.2306, simple_loss=0.2824, pruned_loss=0.06641, ctc_loss=0.1152, over 3296227.91 frames. ], batch size: 38, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:20:52,532 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.77 vs. limit=12.0 2023-10-09 17:21:00,743 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.74 vs. limit=12.0 2023-10-09 17:21:07,904 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.372e+02 3.352e+02 3.979e+02 4.568e+02 7.750e+02, threshold=7.958e+02, percent-clipped=1.0 2023-10-09 17:21:40,556 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=4.56 vs. limit=15.0 2023-10-09 17:21:54,780 INFO [train.py:1031] (2/4) Epoch 14, batch 20150, loss[loss=0.1653, simple_loss=0.2096, pruned_loss=0.04559, ctc_loss=0.07432, over 12132.00 frames. ], tot_loss[loss=0.2329, simple_loss=0.2879, pruned_loss=0.06584, ctc_loss=0.1153, over 3293406.11 frames. ], batch size: 41, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:22:08,422 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2822829.3333333335, ans=0.2 2023-10-09 17:22:16,527 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2822829.3333333335, ans=0.0 2023-10-09 17:22:23,570 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2822876.0, ans=0.2 2023-10-09 17:22:35,812 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2822922.6666666665, ans=0.125 2023-10-09 17:22:53,579 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=2822969.3333333335, ans=0.125 2023-10-09 17:22:55,974 INFO [train.py:1031] (2/4) Epoch 14, batch 20200, loss[loss=0.2167, simple_loss=0.2812, pruned_loss=0.05717, ctc_loss=0.0948, over 16773.00 frames. ], tot_loss[loss=0.237, simple_loss=0.2939, pruned_loss=0.06658, ctc_loss=0.1171, over 3293737.65 frames. ], batch size: 102, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:23:02,906 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2823016.0, ans=0.1 2023-10-09 17:23:12,688 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.574e+02 3.410e+02 4.005e+02 4.580e+02 8.040e+02, threshold=8.011e+02, percent-clipped=1.0 2023-10-09 17:23:13,073 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2823062.6666666665, ans=0.0 2023-10-09 17:23:14,367 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.19 vs. limit=6.0 2023-10-09 17:23:42,249 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2823156.0, ans=0.2 2023-10-09 17:23:46,119 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2823202.6666666665, ans=0.0 2023-10-09 17:23:53,414 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2823202.6666666665, ans=0.0 2023-10-09 17:23:55,824 INFO [train.py:1031] (2/4) Epoch 14, batch 20250, loss[loss=0.1912, simple_loss=0.2675, pruned_loss=0.04156, ctc_loss=0.07942, over 16950.00 frames. ], tot_loss[loss=0.2355, simple_loss=0.2924, pruned_loss=0.06599, ctc_loss=0.1164, over 3297739.29 frames. ], batch size: 258, lr: 2.55e-03, grad_scale: 1.0 2023-10-09 17:24:44,087 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2823389.3333333335, ans=0.0 2023-10-09 17:24:50,843 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.06 vs. limit=15.0 2023-10-09 17:24:58,235 INFO [train.py:1031] (2/4) Epoch 14, batch 20300, loss[loss=0.1953, simple_loss=0.2494, pruned_loss=0.05334, ctc_loss=0.0862, over 16857.00 frames. ], tot_loss[loss=0.2287, simple_loss=0.287, pruned_loss=0.06297, ctc_loss=0.1114, over 3302134.25 frames. ], batch size: 121, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:25:09,274 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2823482.6666666665, ans=0.125 2023-10-09 17:25:13,580 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2823529.3333333335, ans=0.125 2023-10-09 17:25:18,638 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.404e+02 3.144e+02 3.729e+02 4.448e+02 8.440e+02, threshold=7.458e+02, percent-clipped=1.0 2023-10-09 17:25:23,880 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2823576.0, ans=0.04949747468305833 2023-10-09 17:25:31,018 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.78 vs. limit=15.0 2023-10-09 17:25:35,679 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.15 vs. limit=15.0 2023-10-09 17:25:39,445 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2823622.6666666665, ans=0.125 2023-10-09 17:25:56,055 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=2823669.3333333335, ans=0.0 2023-10-09 17:26:00,740 INFO [train.py:1031] (2/4) Epoch 14, batch 20350, loss[loss=0.1997, simple_loss=0.2561, pruned_loss=0.05356, ctc_loss=0.09036, over 16896.00 frames. ], tot_loss[loss=0.2244, simple_loss=0.2806, pruned_loss=0.0622, ctc_loss=0.1093, over 3301990.12 frames. ], batch size: 90, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:26:22,915 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2823762.6666666665, ans=0.2 2023-10-09 17:26:32,680 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2823809.3333333335, ans=0.125 2023-10-09 17:26:38,467 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2823856.0, ans=0.125 2023-10-09 17:26:57,352 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2823902.6666666665, ans=0.125 2023-10-09 17:27:02,817 INFO [train.py:1031] (2/4) Epoch 14, batch 20400, loss[loss=0.2423, simple_loss=0.3241, pruned_loss=0.06069, ctc_loss=0.09779, over 15200.00 frames. ], tot_loss[loss=0.225, simple_loss=0.2809, pruned_loss=0.06278, ctc_loss=0.1088, over 3292260.81 frames. ], batch size: 526, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:27:08,771 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2823949.3333333335, ans=0.2 2023-10-09 17:27:23,293 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.551e+02 3.333e+02 4.109e+02 4.919e+02 1.143e+03, threshold=8.217e+02, percent-clipped=3.0 2023-10-09 17:27:27,640 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2824042.6666666665, ans=0.0 2023-10-09 17:27:54,456 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2824136.0, ans=0.09899494936611666 2023-10-09 17:28:04,147 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2824136.0, ans=0.125 2023-10-09 17:28:05,994 INFO [train.py:1031] (2/4) Epoch 14, batch 20450, loss[loss=0.2073, simple_loss=0.2698, pruned_loss=0.05392, ctc_loss=0.09258, over 16860.00 frames. ], tot_loss[loss=0.2235, simple_loss=0.2798, pruned_loss=0.06222, ctc_loss=0.1067, over 3287510.97 frames. ], batch size: 243, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:28:07,369 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2824182.6666666665, ans=0.125 2023-10-09 17:28:42,448 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2824276.0, ans=0.125 2023-10-09 17:28:44,511 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.37 vs. limit=22.5 2023-10-09 17:28:51,024 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2824322.6666666665, ans=0.0 2023-10-09 17:28:55,495 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2824322.6666666665, ans=0.125 2023-10-09 17:29:11,358 INFO [train.py:1031] (2/4) Epoch 14, batch 20500, loss[loss=0.2399, simple_loss=0.319, pruned_loss=0.05887, ctc_loss=0.1077, over 16860.00 frames. ], tot_loss[loss=0.2199, simple_loss=0.2793, pruned_loss=0.05972, ctc_loss=0.1026, over 3290551.85 frames. ], batch size: 242, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:29:24,807 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.65 vs. limit=22.5 2023-10-09 17:29:32,932 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.295e+02 3.206e+02 4.100e+02 5.464e+02 8.452e+02, threshold=8.200e+02, percent-clipped=1.0 2023-10-09 17:29:40,238 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2824509.3333333335, ans=0.1 2023-10-09 17:29:45,470 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.01 vs. limit=15.0 2023-10-09 17:30:15,067 INFO [train.py:1031] (2/4) Epoch 14, batch 20550, loss[loss=0.1941, simple_loss=0.2552, pruned_loss=0.05001, ctc_loss=0.0823, over 16768.00 frames. ], tot_loss[loss=0.2251, simple_loss=0.288, pruned_loss=0.06024, ctc_loss=0.1043, over 3290207.42 frames. ], batch size: 121, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:30:22,693 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2824649.3333333335, ans=0.125 2023-10-09 17:30:26,416 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2824696.0, ans=0.0 2023-10-09 17:30:39,205 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2824742.6666666665, ans=0.125 2023-10-09 17:30:48,856 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2824742.6666666665, ans=0.0 2023-10-09 17:30:53,534 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.63 vs. limit=12.0 2023-10-09 17:30:57,955 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2824789.3333333335, ans=0.125 2023-10-09 17:31:14,113 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2824836.0, ans=0.125 2023-10-09 17:31:15,678 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2824836.0, ans=0.0 2023-10-09 17:31:17,511 INFO [train.py:1031] (2/4) Epoch 14, batch 20600, loss[loss=0.1948, simple_loss=0.2597, pruned_loss=0.04877, ctc_loss=0.08083, over 16760.00 frames. ], tot_loss[loss=0.2288, simple_loss=0.2932, pruned_loss=0.06095, ctc_loss=0.1065, over 3289713.51 frames. ], batch size: 102, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:31:19,007 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2824882.6666666665, ans=0.2 2023-10-09 17:31:29,900 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2824929.3333333335, ans=0.2 2023-10-09 17:31:35,785 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2824929.3333333335, ans=0.0 2023-10-09 17:31:36,991 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2824929.3333333335, ans=0.125 2023-10-09 17:31:40,865 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.596e+02 3.711e+02 4.411e+02 5.380e+02 7.131e+02, threshold=8.823e+02, percent-clipped=0.0 2023-10-09 17:32:16,778 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2825069.3333333335, ans=0.0 2023-10-09 17:32:20,106 INFO [train.py:1031] (2/4) Epoch 14, batch 20650, loss[loss=0.2587, simple_loss=0.2977, pruned_loss=0.0809, ctc_loss=0.145, over 15201.00 frames. ], tot_loss[loss=0.2367, simple_loss=0.2983, pruned_loss=0.06484, ctc_loss=0.1136, over 3292698.65 frames. ], batch size: 526, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:32:22,246 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2825116.0, ans=0.05 2023-10-09 17:32:30,322 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2825116.0, ans=0.05 2023-10-09 17:32:33,970 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2825162.6666666665, ans=0.125 2023-10-09 17:32:40,115 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2825162.6666666665, ans=0.125 2023-10-09 17:32:44,923 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.88 vs. limit=22.5 2023-10-09 17:33:01,170 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2825256.0, ans=0.125 2023-10-09 17:33:14,724 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2825302.6666666665, ans=0.125 2023-10-09 17:33:21,885 INFO [train.py:1031] (2/4) Epoch 14, batch 20700, loss[loss=0.222, simple_loss=0.2887, pruned_loss=0.05801, ctc_loss=0.09815, over 16732.00 frames. ], tot_loss[loss=0.2372, simple_loss=0.297, pruned_loss=0.06568, ctc_loss=0.115, over 3306129.13 frames. ], batch size: 102, lr: 2.55e-03, grad_scale: 8.0 2023-10-09 17:33:38,345 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2825396.0, ans=0.0 2023-10-09 17:33:45,255 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.303e+02 3.306e+02 3.691e+02 4.281e+02 9.878e+02, threshold=7.382e+02, percent-clipped=2.0 2023-10-09 17:33:48,924 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2825442.6666666665, ans=0.1 2023-10-09 17:33:50,556 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2825442.6666666665, ans=0.125 2023-10-09 17:33:54,293 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2825442.6666666665, ans=0.0 2023-10-09 17:34:01,045 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2825489.3333333335, ans=0.0 2023-10-09 17:34:14,436 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2825536.0, ans=0.125 2023-10-09 17:34:14,500 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2825536.0, ans=0.0 2023-10-09 17:34:15,533 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2825536.0, ans=0.0 2023-10-09 17:34:22,954 INFO [train.py:1031] (2/4) Epoch 14, batch 20750, loss[loss=0.2432, simple_loss=0.2976, pruned_loss=0.06952, ctc_loss=0.1244, over 16817.00 frames. ], tot_loss[loss=0.2393, simple_loss=0.2958, pruned_loss=0.06765, ctc_loss=0.1186, over 3311991.33 frames. ], batch size: 272, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:34:23,888 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2825582.6666666665, ans=0.125 2023-10-09 17:34:27,278 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.61 vs. limit=22.5 2023-10-09 17:34:35,507 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.06 vs. limit=15.0 2023-10-09 17:34:46,384 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2825676.0, ans=0.0 2023-10-09 17:34:55,415 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2825676.0, ans=0.125 2023-10-09 17:35:17,475 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.31 vs. limit=15.0 2023-10-09 17:35:23,262 INFO [train.py:1031] (2/4) Epoch 14, batch 20800, loss[loss=0.2195, simple_loss=0.2919, pruned_loss=0.05292, ctc_loss=0.1034, over 16901.00 frames. ], tot_loss[loss=0.2382, simple_loss=0.2953, pruned_loss=0.06687, ctc_loss=0.1183, over 3307555.18 frames. ], batch size: 258, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:35:46,252 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.527e+02 3.235e+02 3.640e+02 4.210e+02 8.474e+02, threshold=7.280e+02, percent-clipped=1.0 2023-10-09 17:36:00,925 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2825956.0, ans=0.0 2023-10-09 17:36:02,086 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2825956.0, ans=10.0 2023-10-09 17:36:04,678 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2825956.0, ans=0.125 2023-10-09 17:36:12,523 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2826002.6666666665, ans=0.125 2023-10-09 17:36:22,180 INFO [train.py:1031] (2/4) Epoch 14, batch 20850, loss[loss=0.2051, simple_loss=0.2653, pruned_loss=0.05211, ctc_loss=0.1014, over 15496.00 frames. ], tot_loss[loss=0.2306, simple_loss=0.2896, pruned_loss=0.06318, ctc_loss=0.1129, over 3298279.29 frames. ], batch size: 526, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:36:25,297 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2826049.3333333335, ans=0.125 2023-10-09 17:36:40,424 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 17:36:57,749 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2826142.6666666665, ans=0.05 2023-10-09 17:37:02,701 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2826189.3333333335, ans=0.1 2023-10-09 17:37:03,216 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.24 vs. limit=6.0 2023-10-09 17:37:08,622 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2826189.3333333335, ans=0.125 2023-10-09 17:37:12,979 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 17:37:18,150 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2826236.0, ans=0.125 2023-10-09 17:37:20,337 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2826236.0, ans=0.125 2023-10-09 17:37:22,226 INFO [train.py:1031] (2/4) Epoch 14, batch 20900, loss[loss=0.2156, simple_loss=0.2579, pruned_loss=0.06508, ctc_loss=0.108, over 16566.00 frames. ], tot_loss[loss=0.2254, simple_loss=0.2854, pruned_loss=0.06098, ctc_loss=0.1088, over 3293327.98 frames. ], batch size: 110, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:37:33,611 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2826329.3333333335, ans=0.125 2023-10-09 17:37:48,711 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+02 2.791e+02 3.163e+02 3.693e+02 7.251e+02, threshold=6.327e+02, percent-clipped=0.0 2023-10-09 17:37:50,226 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2826376.0, ans=0.125 2023-10-09 17:38:00,105 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2826422.6666666665, ans=0.0 2023-10-09 17:38:22,287 INFO [train.py:1031] (2/4) Epoch 14, batch 20950, loss[loss=0.1835, simple_loss=0.222, pruned_loss=0.05302, ctc_loss=0.09726, over 16146.00 frames. ], tot_loss[loss=0.2205, simple_loss=0.278, pruned_loss=0.06011, ctc_loss=0.1068, over 3290244.97 frames. ], batch size: 466, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:38:23,718 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=2826516.0, ans=0.2 2023-10-09 17:38:30,649 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2826516.0, ans=0.1 2023-10-09 17:38:39,534 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2826562.6666666665, ans=0.125 2023-10-09 17:38:44,388 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2826562.6666666665, ans=0.0 2023-10-09 17:38:48,811 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2826609.3333333335, ans=0.0 2023-10-09 17:39:02,074 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.28 vs. limit=22.5 2023-10-09 17:39:13,950 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2826702.6666666665, ans=0.125 2023-10-09 17:39:21,704 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.33 vs. limit=22.5 2023-10-09 17:39:23,279 INFO [train.py:1031] (2/4) Epoch 14, batch 21000, loss[loss=0.2457, simple_loss=0.3229, pruned_loss=0.06309, ctc_loss=0.106, over 16805.00 frames. ], tot_loss[loss=0.2229, simple_loss=0.2786, pruned_loss=0.06174, ctc_loss=0.1092, over 3283357.44 frames. ], batch size: 95, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:39:23,279 INFO [train.py:1054] (2/4) Computing validation loss 2023-10-09 17:39:41,353 INFO [train.py:1063] (2/4) Epoch 14, validation: loss=0.2348, simple_loss=0.3049, pruned_loss=0.06333, ctc_loss=0.09533, over 1796401.00 frames. 2023-10-09 17:39:41,354 INFO [train.py:1064] (2/4) Maximum memory allocated so far is 14591MB 2023-10-09 17:39:41,640 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2826749.3333333335, ans=0.1 2023-10-09 17:39:42,175 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.55 vs. limit=10.0 2023-10-09 17:39:46,912 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2826749.3333333335, ans=10.0 2023-10-09 17:39:56,094 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2826796.0, ans=0.125 2023-10-09 17:40:07,007 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.638e+02 3.238e+02 3.624e+02 4.210e+02 7.239e+02, threshold=7.249e+02, percent-clipped=3.0 2023-10-09 17:40:13,313 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2826842.6666666665, ans=0.0 2023-10-09 17:40:24,683 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=2826889.3333333335, ans=15.0 2023-10-09 17:40:29,695 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2826936.0, ans=0.0 2023-10-09 17:40:31,324 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2826936.0, ans=0.025 2023-10-09 17:40:36,152 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2826936.0, ans=0.2 2023-10-09 17:40:39,034 INFO [train.py:1031] (2/4) Epoch 14, batch 21050, loss[loss=0.2862, simple_loss=0.3411, pruned_loss=0.0845, ctc_loss=0.1559, over 16342.00 frames. ], tot_loss[loss=0.2244, simple_loss=0.2829, pruned_loss=0.06137, ctc_loss=0.108, over 3275119.14 frames. ], batch size: 414, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:41:11,599 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2827076.0, ans=0.125 2023-10-09 17:41:11,632 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2827076.0, ans=0.125 2023-10-09 17:41:29,926 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2827169.3333333335, ans=0.0 2023-10-09 17:41:36,264 INFO [train.py:1031] (2/4) Epoch 14, batch 21100, loss[loss=0.2275, simple_loss=0.2766, pruned_loss=0.06757, ctc_loss=0.108, over 16635.00 frames. ], tot_loss[loss=0.2224, simple_loss=0.2825, pruned_loss=0.06021, ctc_loss=0.1047, over 3291624.93 frames. ], batch size: 151, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:41:36,462 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2827216.0, ans=0.0 2023-10-09 17:41:38,753 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.08 vs. limit=15.0 2023-10-09 17:42:02,340 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2827309.3333333335, ans=0.0 2023-10-09 17:42:03,031 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2827309.3333333335, ans=0.125 2023-10-09 17:42:05,336 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.083e+02 2.720e+02 3.068e+02 3.590e+02 8.081e+02, threshold=6.137e+02, percent-clipped=1.0 2023-10-09 17:42:37,594 INFO [train.py:1031] (2/4) Epoch 14, batch 21150, loss[loss=0.202, simple_loss=0.262, pruned_loss=0.05348, ctc_loss=0.08767, over 16959.00 frames. ], tot_loss[loss=0.2208, simple_loss=0.2791, pruned_loss=0.06034, ctc_loss=0.1045, over 3299674.34 frames. ], batch size: 86, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:42:43,617 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2827449.3333333335, ans=0.95 2023-10-09 17:42:43,672 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2827449.3333333335, ans=0.125 2023-10-09 17:42:45,706 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2827449.3333333335, ans=0.125 2023-10-09 17:43:00,176 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2827542.6666666665, ans=0.025 2023-10-09 17:43:02,319 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2827542.6666666665, ans=0.125 2023-10-09 17:43:20,356 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2827589.3333333335, ans=0.0 2023-10-09 17:43:36,527 INFO [train.py:1031] (2/4) Epoch 14, batch 21200, loss[loss=0.1871, simple_loss=0.2466, pruned_loss=0.04718, ctc_loss=0.08301, over 16682.00 frames. ], tot_loss[loss=0.2185, simple_loss=0.274, pruned_loss=0.06056, ctc_loss=0.105, over 3305229.11 frames. ], batch size: 102, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:43:36,859 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2827682.6666666665, ans=0.09899494936611666 2023-10-09 17:43:42,833 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2827682.6666666665, ans=0.125 2023-10-09 17:43:52,910 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2827729.3333333335, ans=0.0 2023-10-09 17:43:56,550 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2827729.3333333335, ans=0.0 2023-10-09 17:44:00,353 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 17:44:07,100 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.190e+02 3.239e+02 3.845e+02 5.038e+02 8.843e+02, threshold=7.690e+02, percent-clipped=9.0 2023-10-09 17:44:07,381 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2827776.0, ans=0.0 2023-10-09 17:44:14,462 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2827822.6666666665, ans=0.125 2023-10-09 17:44:31,877 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2827869.3333333335, ans=0.0 2023-10-09 17:44:34,670 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2827869.3333333335, ans=0.125 2023-10-09 17:44:35,018 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.75 vs. limit=10.0 2023-10-09 17:44:39,123 INFO [train.py:1031] (2/4) Epoch 14, batch 21250, loss[loss=0.2136, simple_loss=0.2686, pruned_loss=0.05948, ctc_loss=0.09928, over 16698.00 frames. ], tot_loss[loss=0.2145, simple_loss=0.2727, pruned_loss=0.05795, ctc_loss=0.1009, over 3302888.00 frames. ], batch size: 130, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:44:41,222 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2827916.0, ans=0.1 2023-10-09 17:44:47,494 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2827916.0, ans=0.0 2023-10-09 17:44:57,455 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2827962.6666666665, ans=0.125 2023-10-09 17:45:07,380 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.40 vs. limit=12.0 2023-10-09 17:45:30,397 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2828102.6666666665, ans=0.0 2023-10-09 17:45:36,783 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-10-09 17:45:42,989 INFO [train.py:1031] (2/4) Epoch 14, batch 21300, loss[loss=0.2077, simple_loss=0.2686, pruned_loss=0.05532, ctc_loss=0.09013, over 16710.00 frames. ], tot_loss[loss=0.2256, simple_loss=0.2859, pruned_loss=0.06125, ctc_loss=0.1071, over 3301102.18 frames. ], batch size: 140, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:45:44,421 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2828149.3333333335, ans=0.125 2023-10-09 17:46:14,301 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.457e+02 3.461e+02 4.159e+02 5.409e+02 1.290e+03, threshold=8.318e+02, percent-clipped=7.0 2023-10-09 17:46:35,705 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2828336.0, ans=0.07 2023-10-09 17:46:39,464 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2828336.0, ans=0.0 2023-10-09 17:46:45,014 INFO [train.py:1031] (2/4) Epoch 14, batch 21350, loss[loss=0.2277, simple_loss=0.2949, pruned_loss=0.05879, ctc_loss=0.1073, over 16372.00 frames. ], tot_loss[loss=0.2245, simple_loss=0.2866, pruned_loss=0.06003, ctc_loss=0.1057, over 3297006.17 frames. ], batch size: 463, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:47:02,269 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=2828429.3333333335, ans=0.2 2023-10-09 17:47:04,846 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2828429.3333333335, ans=0.125 2023-10-09 17:47:05,944 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2828429.3333333335, ans=0.1 2023-10-09 17:47:10,597 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2828476.0, ans=0.2 2023-10-09 17:47:22,722 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2828522.6666666665, ans=0.1 2023-10-09 17:47:30,108 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2828522.6666666665, ans=0.0 2023-10-09 17:47:35,001 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2828569.3333333335, ans=0.0 2023-10-09 17:47:35,423 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.12 vs. limit=15.0 2023-10-09 17:47:40,256 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.75 vs. limit=22.5 2023-10-09 17:47:47,067 INFO [train.py:1031] (2/4) Epoch 14, batch 21400, loss[loss=0.2173, simple_loss=0.2739, pruned_loss=0.0596, ctc_loss=0.1037, over 16833.00 frames. ], tot_loss[loss=0.2263, simple_loss=0.2854, pruned_loss=0.06189, ctc_loss=0.1088, over 3299930.68 frames. ], batch size: 272, lr: 2.55e-03, grad_scale: 8.0 2023-10-09 17:47:55,952 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2828616.0, ans=0.0 2023-10-09 17:47:58,608 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2828662.6666666665, ans=0.0 2023-10-09 17:47:59,884 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.95 vs. limit=10.0 2023-10-09 17:48:02,483 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2828662.6666666665, ans=0.05 2023-10-09 17:48:04,709 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2828662.6666666665, ans=0.125 2023-10-09 17:48:07,431 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2828662.6666666665, ans=0.0 2023-10-09 17:48:19,464 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.290e+02 3.090e+02 3.533e+02 3.983e+02 1.095e+03, threshold=7.067e+02, percent-clipped=1.0 2023-10-09 17:48:19,845 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2828709.3333333335, ans=10.0 2023-10-09 17:48:47,007 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2828802.6666666665, ans=0.0 2023-10-09 17:48:48,157 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.72 vs. limit=15.0 2023-10-09 17:48:48,680 INFO [train.py:1031] (2/4) Epoch 14, batch 21450, loss[loss=0.2151, simple_loss=0.2564, pruned_loss=0.06458, ctc_loss=0.1116, over 16762.00 frames. ], tot_loss[loss=0.2235, simple_loss=0.2797, pruned_loss=0.06197, ctc_loss=0.1085, over 3305732.66 frames. ], batch size: 95, lr: 2.55e-03, grad_scale: 1.0 2023-10-09 17:48:50,006 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2828849.3333333335, ans=0.0 2023-10-09 17:48:55,151 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=2828849.3333333335, ans=6.0 2023-10-09 17:49:03,377 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2828896.0, ans=0.125 2023-10-09 17:49:16,379 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2828942.6666666665, ans=0.07 2023-10-09 17:49:28,560 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2828989.3333333335, ans=0.0 2023-10-09 17:49:32,858 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2828989.3333333335, ans=0.1 2023-10-09 17:49:49,289 INFO [train.py:1031] (2/4) Epoch 14, batch 21500, loss[loss=0.2234, simple_loss=0.2722, pruned_loss=0.06498, ctc_loss=0.1116, over 16578.00 frames. ], tot_loss[loss=0.2225, simple_loss=0.276, pruned_loss=0.06262, ctc_loss=0.1094, over 3310924.83 frames. ], batch size: 110, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:50:22,938 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.347e+02 3.091e+02 3.543e+02 4.001e+02 7.738e+02, threshold=7.086e+02, percent-clipped=2.0 2023-10-09 17:50:25,295 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2829222.6666666665, ans=0.0 2023-10-09 17:50:46,772 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2829269.3333333335, ans=0.125 2023-10-09 17:50:47,001 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=2829269.3333333335, ans=15.0 2023-10-09 17:50:49,221 INFO [train.py:1031] (2/4) Epoch 14, batch 21550, loss[loss=0.2033, simple_loss=0.2635, pruned_loss=0.05365, ctc_loss=0.08933, over 16654.00 frames. ], tot_loss[loss=0.221, simple_loss=0.2735, pruned_loss=0.06251, ctc_loss=0.1086, over 3299331.67 frames. ], batch size: 102, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:51:12,894 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2829362.6666666665, ans=0.125 2023-10-09 17:51:20,938 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2829409.3333333335, ans=0.0 2023-10-09 17:51:31,432 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2829456.0, ans=0.125 2023-10-09 17:51:36,301 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2829456.0, ans=0.125 2023-10-09 17:51:45,674 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2829502.6666666665, ans=0.1 2023-10-09 17:51:47,940 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2829502.6666666665, ans=0.0 2023-10-09 17:51:52,384 INFO [train.py:1031] (2/4) Epoch 14, batch 21600, loss[loss=0.2563, simple_loss=0.3102, pruned_loss=0.07343, ctc_loss=0.1391, over 16921.00 frames. ], tot_loss[loss=0.2225, simple_loss=0.277, pruned_loss=0.06224, ctc_loss=0.1086, over 3297800.83 frames. ], batch size: 258, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:52:29,888 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.267e+02 3.318e+02 3.916e+02 4.621e+02 6.071e+02, threshold=7.833e+02, percent-clipped=0.0 2023-10-09 17:52:40,881 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2829689.3333333335, ans=0.125 2023-10-09 17:52:55,781 INFO [train.py:1031] (2/4) Epoch 14, batch 21650, loss[loss=0.256, simple_loss=0.3032, pruned_loss=0.0776, ctc_loss=0.1339, over 16561.00 frames. ], tot_loss[loss=0.2305, simple_loss=0.2841, pruned_loss=0.0656, ctc_loss=0.1144, over 3303527.90 frames. ], batch size: 110, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:52:56,071 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2829782.6666666665, ans=0.125 2023-10-09 17:53:07,191 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2829782.6666666665, ans=0.125 2023-10-09 17:53:43,456 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.00 vs. limit=6.0 2023-10-09 17:53:45,682 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2829922.6666666665, ans=0.2 2023-10-09 17:53:47,722 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2829969.3333333335, ans=0.0 2023-10-09 17:53:53,618 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2829969.3333333335, ans=0.0 2023-10-09 17:53:56,091 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.75 vs. limit=15.0 2023-10-09 17:53:59,381 INFO [train.py:1031] (2/4) Epoch 14, batch 21700, loss[loss=0.2546, simple_loss=0.3161, pruned_loss=0.07198, ctc_loss=0.1227, over 16834.00 frames. ], tot_loss[loss=0.2366, simple_loss=0.2899, pruned_loss=0.068, ctc_loss=0.1185, over 3312758.42 frames. ], batch size: 228, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:54:06,193 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2830016.0, ans=0.125 2023-10-09 17:54:07,320 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2830016.0, ans=0.0 2023-10-09 17:54:08,787 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=2830016.0, ans=15.0 2023-10-09 17:54:16,154 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2830062.6666666665, ans=0.0 2023-10-09 17:54:32,818 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.26 vs. limit=15.0 2023-10-09 17:54:34,706 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.610e+02 3.454e+02 3.938e+02 4.640e+02 9.291e+02, threshold=7.877e+02, percent-clipped=1.0 2023-10-09 17:54:35,148 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 17:54:39,208 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2830156.0, ans=0.0 2023-10-09 17:54:58,956 INFO [train.py:1031] (2/4) Epoch 14, batch 21750, loss[loss=0.2722, simple_loss=0.3298, pruned_loss=0.07844, ctc_loss=0.1439, over 16929.00 frames. ], tot_loss[loss=0.2385, simple_loss=0.2949, pruned_loss=0.0675, ctc_loss=0.1177, over 3314590.73 frames. ], batch size: 309, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 17:55:12,836 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2830296.0, ans=0.2 2023-10-09 17:55:24,194 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2830342.6666666665, ans=0.125 2023-10-09 17:55:35,219 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2830389.3333333335, ans=0.1 2023-10-09 17:56:00,624 INFO [train.py:1031] (2/4) Epoch 14, batch 21800, loss[loss=0.166, simple_loss=0.2508, pruned_loss=0.02919, ctc_loss=0.05678, over 16800.00 frames. ], tot_loss[loss=0.2304, simple_loss=0.2892, pruned_loss=0.06365, ctc_loss=0.1108, over 3309007.59 frames. ], batch size: 272, lr: 2.55e-03, grad_scale: 8.0 2023-10-09 17:56:14,688 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.28 vs. limit=22.5 2023-10-09 17:56:19,436 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2830529.3333333335, ans=0.125 2023-10-09 17:56:34,290 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.95 vs. limit=15.0 2023-10-09 17:56:37,328 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+02 2.581e+02 3.060e+02 4.394e+02 8.007e+02, threshold=6.120e+02, percent-clipped=1.0 2023-10-09 17:57:00,883 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2830669.3333333335, ans=0.1 2023-10-09 17:57:03,733 INFO [train.py:1031] (2/4) Epoch 14, batch 21850, loss[loss=0.2116, simple_loss=0.2913, pruned_loss=0.04848, ctc_loss=0.08723, over 16897.00 frames. ], tot_loss[loss=0.2232, simple_loss=0.2859, pruned_loss=0.05945, ctc_loss=0.1042, over 3301093.26 frames. ], batch size: 215, lr: 2.55e-03, grad_scale: 8.0 2023-10-09 17:57:38,041 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2830809.3333333335, ans=0.1 2023-10-09 17:57:56,907 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.88 vs. limit=15.0 2023-10-09 17:57:57,675 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2830902.6666666665, ans=0.0 2023-10-09 17:58:02,834 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.78 vs. limit=10.0 2023-10-09 17:58:03,623 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2830902.6666666665, ans=0.0 2023-10-09 17:58:06,346 INFO [train.py:1031] (2/4) Epoch 14, batch 21900, loss[loss=0.2766, simple_loss=0.3324, pruned_loss=0.08121, ctc_loss=0.1459, over 16878.00 frames. ], tot_loss[loss=0.2272, simple_loss=0.2901, pruned_loss=0.06081, ctc_loss=0.1069, over 3305648.93 frames. ], batch size: 292, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:58:12,283 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2830949.3333333335, ans=0.0 2023-10-09 17:58:47,463 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.236e+02 3.180e+02 3.668e+02 4.478e+02 7.065e+02, threshold=7.335e+02, percent-clipped=3.0 2023-10-09 17:58:47,881 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2831089.3333333335, ans=0.1 2023-10-09 17:58:54,459 WARNING [train.py:1204] (2/4) Exclude cut with ID X0000003684_17524832_S00712_sp1.1 from training. Number of frames (before subsampling): 130. Number of frames (after subsampling): 31. Text: 哒哒哒哒哒哒哒哒哒哒哒哒. Tokens: ['▁', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>', '<0xE5>', '<0x93>', '<0x92>']. Number of tokens: 37 2023-10-09 17:59:01,972 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2831136.0, ans=0.125 2023-10-09 17:59:07,527 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2831136.0, ans=0.0 2023-10-09 17:59:10,948 INFO [train.py:1031] (2/4) Epoch 14, batch 21950, loss[loss=0.3611, simple_loss=0.4121, pruned_loss=0.1149, ctc_loss=0.2012, over 16471.00 frames. ], tot_loss[loss=0.2395, simple_loss=0.299, pruned_loss=0.06665, ctc_loss=0.1167, over 3300122.52 frames. ], batch size: 416, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 17:59:15,925 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.70 vs. limit=15.0 2023-10-09 17:59:19,011 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2831182.6666666665, ans=0.125 2023-10-09 17:59:23,437 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2831229.3333333335, ans=0.09899494936611666 2023-10-09 17:59:23,815 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.68 vs. limit=10.0 2023-10-09 17:59:31,625 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2831229.3333333335, ans=0.125 2023-10-09 17:59:37,235 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2831276.0, ans=0.07 2023-10-09 17:59:41,903 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.51 vs. limit=12.0 2023-10-09 17:59:42,701 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2831276.0, ans=0.0 2023-10-09 17:59:53,651 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2831322.6666666665, ans=0.125 2023-10-09 18:00:14,566 INFO [train.py:1031] (2/4) Epoch 14, batch 22000, loss[loss=0.261, simple_loss=0.3108, pruned_loss=0.07863, ctc_loss=0.1349, over 16787.00 frames. ], tot_loss[loss=0.2497, simple_loss=0.3081, pruned_loss=0.07089, ctc_loss=0.1237, over 3301658.72 frames. ], batch size: 292, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:00:29,066 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2831462.6666666665, ans=0.0 2023-10-09 18:00:53,675 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2831556.0, ans=0.2 2023-10-09 18:00:55,417 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.792e+02 3.995e+02 5.154e+02 7.072e+02 9.807e+02, threshold=1.031e+03, percent-clipped=19.0 2023-10-09 18:00:56,833 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2831556.0, ans=0.125 2023-10-09 18:01:02,683 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2831556.0, ans=0.1 2023-10-09 18:01:10,950 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2831602.6666666665, ans=0.0 2023-10-09 18:01:16,545 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.62 vs. limit=15.0 2023-10-09 18:01:17,530 INFO [train.py:1031] (2/4) Epoch 14, batch 22050, loss[loss=0.2251, simple_loss=0.2519, pruned_loss=0.07285, ctc_loss=0.1313, over 16464.00 frames. ], tot_loss[loss=0.2428, simple_loss=0.2984, pruned_loss=0.06942, ctc_loss=0.1209, over 3304011.82 frames. ], batch size: 418, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:01:22,734 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.80 vs. limit=10.0 2023-10-09 18:01:23,498 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2831649.3333333335, ans=0.0 2023-10-09 18:01:38,171 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2831696.0, ans=0.125 2023-10-09 18:01:47,672 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2831742.6666666665, ans=0.0 2023-10-09 18:01:58,501 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2831789.3333333335, ans=0.125 2023-10-09 18:02:09,708 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2831836.0, ans=0.1 2023-10-09 18:02:22,388 INFO [train.py:1031] (2/4) Epoch 14, batch 22100, loss[loss=0.2764, simple_loss=0.3628, pruned_loss=0.07074, ctc_loss=0.1213, over 16275.00 frames. ], tot_loss[loss=0.2409, simple_loss=0.2967, pruned_loss=0.06878, ctc_loss=0.1191, over 3303676.02 frames. ], batch size: 463, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:02:23,838 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2831882.6666666665, ans=0.0 2023-10-09 18:02:25,927 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2831882.6666666665, ans=0.125 2023-10-09 18:02:25,973 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2831882.6666666665, ans=0.0 2023-10-09 18:02:42,679 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2831929.3333333335, ans=0.125 2023-10-09 18:02:49,733 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.96 vs. limit=15.0 2023-10-09 18:02:57,600 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=2831976.0, ans=15.0 2023-10-09 18:03:04,527 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.401e+02 3.384e+02 3.750e+02 4.334e+02 8.202e+02, threshold=7.499e+02, percent-clipped=0.0 2023-10-09 18:03:09,188 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2832022.6666666665, ans=0.1 2023-10-09 18:03:10,217 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2832069.3333333335, ans=0.1 2023-10-09 18:03:16,914 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2832069.3333333335, ans=0.0 2023-10-09 18:03:22,966 INFO [train.py:1031] (2/4) Epoch 14, batch 22150, loss[loss=0.2459, simple_loss=0.2992, pruned_loss=0.07008, ctc_loss=0.1312, over 16873.00 frames. ], tot_loss[loss=0.2439, simple_loss=0.2997, pruned_loss=0.06993, ctc_loss=0.1206, over 3308845.72 frames. ], batch size: 242, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:03:28,344 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2832116.0, ans=0.1 2023-10-09 18:03:35,811 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.33 vs. limit=22.5 2023-10-09 18:04:02,781 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.69 vs. limit=15.0 2023-10-09 18:04:14,789 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.60 vs. limit=10.0 2023-10-09 18:04:24,460 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2832349.3333333335, ans=0.125 2023-10-09 18:04:25,079 INFO [train.py:1031] (2/4) Epoch 14, batch 22200, loss[loss=0.2041, simple_loss=0.2861, pruned_loss=0.04397, ctc_loss=0.0857, over 16762.00 frames. ], tot_loss[loss=0.2441, simple_loss=0.2997, pruned_loss=0.07001, ctc_loss=0.1212, over 3307141.77 frames. ], batch size: 271, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:04:41,821 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.14 vs. limit=22.5 2023-10-09 18:04:46,162 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.70 vs. limit=15.0 2023-10-09 18:05:06,088 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.358e+02 3.127e+02 3.515e+02 4.166e+02 8.841e+02, threshold=7.030e+02, percent-clipped=1.0 2023-10-09 18:05:06,406 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2832489.3333333335, ans=0.0 2023-10-09 18:05:09,584 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2832489.3333333335, ans=0.1 2023-10-09 18:05:13,151 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.15 vs. limit=22.5 2023-10-09 18:05:24,123 INFO [train.py:1031] (2/4) Epoch 14, batch 22250, loss[loss=0.239, simple_loss=0.2886, pruned_loss=0.07036, ctc_loss=0.1216, over 16857.00 frames. ], tot_loss[loss=0.2424, simple_loss=0.2994, pruned_loss=0.06883, ctc_loss=0.1195, over 3305291.71 frames. ], batch size: 164, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:05:52,202 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=11.98 vs. limit=22.5 2023-10-09 18:05:59,903 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2832722.6666666665, ans=0.125 2023-10-09 18:06:25,301 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.82 vs. limit=10.0 2023-10-09 18:06:25,776 INFO [train.py:1031] (2/4) Epoch 14, batch 22300, loss[loss=0.2276, simple_loss=0.278, pruned_loss=0.0662, ctc_loss=0.1117, over 16775.00 frames. ], tot_loss[loss=0.2429, simple_loss=0.2986, pruned_loss=0.06937, ctc_loss=0.1208, over 3306505.99 frames. ], batch size: 121, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:06:26,168 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2832816.0, ans=0.125 2023-10-09 18:06:41,626 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2832862.6666666665, ans=0.2 2023-10-09 18:07:05,957 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2832956.0, ans=0.2 2023-10-09 18:07:07,706 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.522e+02 3.458e+02 3.881e+02 4.380e+02 7.162e+02, threshold=7.762e+02, percent-clipped=2.0 2023-10-09 18:07:25,985 INFO [train.py:1031] (2/4) Epoch 14, batch 22350, loss[loss=0.2136, simple_loss=0.2688, pruned_loss=0.05922, ctc_loss=0.09998, over 16823.00 frames. ], tot_loss[loss=0.2432, simple_loss=0.2974, pruned_loss=0.07003, ctc_loss=0.1221, over 3300922.61 frames. ], batch size: 176, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:07:29,065 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2833049.3333333335, ans=0.0 2023-10-09 18:07:33,698 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=2833049.3333333335, ans=0.02 2023-10-09 18:07:40,580 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.21 vs. limit=12.0 2023-10-09 18:08:13,238 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2833189.3333333335, ans=0.125 2023-10-09 18:08:14,388 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2833236.0, ans=0.125 2023-10-09 18:08:27,605 INFO [train.py:1031] (2/4) Epoch 14, batch 22400, loss[loss=0.3038, simple_loss=0.3675, pruned_loss=0.08797, ctc_loss=0.1606, over 16669.00 frames. ], tot_loss[loss=0.2436, simple_loss=0.2996, pruned_loss=0.06946, ctc_loss=0.1214, over 3305389.41 frames. ], batch size: 384, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:08:42,303 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2833329.3333333335, ans=0.125 2023-10-09 18:08:54,847 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2833376.0, ans=0.125 2023-10-09 18:09:08,190 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.19 vs. limit=15.0 2023-10-09 18:09:11,653 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.558e+02 3.421e+02 3.975e+02 5.211e+02 8.186e+02, threshold=7.949e+02, percent-clipped=2.0 2023-10-09 18:09:29,996 INFO [train.py:1031] (2/4) Epoch 14, batch 22450, loss[loss=0.2374, simple_loss=0.2937, pruned_loss=0.06771, ctc_loss=0.1141, over 16215.00 frames. ], tot_loss[loss=0.2455, simple_loss=0.3025, pruned_loss=0.06981, ctc_loss=0.1221, over 3301342.43 frames. ], batch size: 463, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:09:33,199 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.31 vs. limit=12.0 2023-10-09 18:09:35,187 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2833516.0, ans=0.125 2023-10-09 18:09:36,296 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2833516.0, ans=0.1 2023-10-09 18:09:55,396 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2833609.3333333335, ans=0.125 2023-10-09 18:10:23,557 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2833702.6666666665, ans=0.0 2023-10-09 18:10:31,447 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.09 vs. limit=10.0 2023-10-09 18:10:31,932 INFO [train.py:1031] (2/4) Epoch 14, batch 22500, loss[loss=0.2248, simple_loss=0.2689, pruned_loss=0.06611, ctc_loss=0.1214, over 16788.00 frames. ], tot_loss[loss=0.2448, simple_loss=0.2997, pruned_loss=0.07031, ctc_loss=0.123, over 3300629.46 frames. ], batch size: 292, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:11:17,897 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.767e+02 3.228e+02 3.590e+02 3.967e+02 7.433e+02, threshold=7.180e+02, percent-clipped=0.0 2023-10-09 18:11:32,584 INFO [train.py:1031] (2/4) Epoch 14, batch 22550, loss[loss=0.1765, simple_loss=0.226, pruned_loss=0.0474, ctc_loss=0.0808, over 16784.00 frames. ], tot_loss[loss=0.2381, simple_loss=0.2906, pruned_loss=0.06871, ctc_loss=0.1202, over 3295322.99 frames. ], batch size: 141, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:11:36,257 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2833982.6666666665, ans=0.125 2023-10-09 18:11:57,012 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.34 vs. limit=15.0 2023-10-09 18:11:58,464 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 18:12:02,935 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=11.95 vs. limit=22.5 2023-10-09 18:12:33,380 INFO [train.py:1031] (2/4) Epoch 14, batch 22600, loss[loss=0.255, simple_loss=0.2944, pruned_loss=0.07861, ctc_loss=0.1461, over 16581.00 frames. ], tot_loss[loss=0.2294, simple_loss=0.2845, pruned_loss=0.06451, ctc_loss=0.1134, over 3297775.09 frames. ], batch size: 384, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:12:41,843 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.21 vs. limit=22.5 2023-10-09 18:13:19,599 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2834356.0, ans=0.0 2023-10-09 18:13:20,266 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.347e+02 2.912e+02 3.400e+02 4.128e+02 6.956e+02, threshold=6.801e+02, percent-clipped=0.0 2023-10-09 18:13:26,072 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 18:13:33,332 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2834449.3333333335, ans=0.0 2023-10-09 18:13:34,041 INFO [train.py:1031] (2/4) Epoch 14, batch 22650, loss[loss=0.2397, simple_loss=0.2801, pruned_loss=0.07538, ctc_loss=0.1214, over 16934.00 frames. ], tot_loss[loss=0.2248, simple_loss=0.279, pruned_loss=0.06312, ctc_loss=0.1107, over 3296884.91 frames. ], batch size: 78, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:13:46,258 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2834496.0, ans=0.07 2023-10-09 18:14:16,891 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=2834589.3333333335, ans=0.125 2023-10-09 18:14:35,171 INFO [train.py:1031] (2/4) Epoch 14, batch 22700, loss[loss=0.2503, simple_loss=0.3124, pruned_loss=0.06999, ctc_loss=0.1207, over 16744.00 frames. ], tot_loss[loss=0.2268, simple_loss=0.2792, pruned_loss=0.0646, ctc_loss=0.1132, over 3305558.27 frames. ], batch size: 95, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:14:37,608 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2834682.6666666665, ans=0.04949747468305833 2023-10-09 18:14:37,969 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.35 vs. limit=22.5 2023-10-09 18:14:43,489 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2834682.6666666665, ans=0.0 2023-10-09 18:14:49,457 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.44 vs. limit=12.0 2023-10-09 18:14:50,779 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2834729.3333333335, ans=0.125 2023-10-09 18:14:51,887 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2834729.3333333335, ans=0.1 2023-10-09 18:14:58,883 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2834776.0, ans=0.2 2023-10-09 18:15:16,873 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2834822.6666666665, ans=0.2 2023-10-09 18:15:24,588 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.696e+02 3.390e+02 4.032e+02 4.588e+02 8.428e+02, threshold=8.064e+02, percent-clipped=2.0 2023-10-09 18:15:24,876 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 18:15:25,989 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2834869.3333333335, ans=0.2 2023-10-09 18:15:27,004 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2834869.3333333335, ans=0.125 2023-10-09 18:15:37,583 INFO [train.py:1031] (2/4) Epoch 14, batch 22750, loss[loss=0.2609, simple_loss=0.3019, pruned_loss=0.08173, ctc_loss=0.1412, over 16758.00 frames. ], tot_loss[loss=0.234, simple_loss=0.2846, pruned_loss=0.06788, ctc_loss=0.1188, over 3307163.92 frames. ], batch size: 140, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:15:52,003 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2834962.6666666665, ans=0.125 2023-10-09 18:16:01,887 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2835009.3333333335, ans=0.07 2023-10-09 18:16:39,368 INFO [train.py:1031] (2/4) Epoch 14, batch 22800, loss[loss=0.2564, simple_loss=0.3141, pruned_loss=0.07295, ctc_loss=0.132, over 16823.00 frames. ], tot_loss[loss=0.2397, simple_loss=0.2897, pruned_loss=0.07033, ctc_loss=0.1228, over 3303890.48 frames. ], batch size: 309, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:16:42,726 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2835149.3333333335, ans=0.0 2023-10-09 18:16:59,871 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2835196.0, ans=0.125 2023-10-09 18:16:59,894 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2835196.0, ans=0.0 2023-10-09 18:17:05,322 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2835242.6666666665, ans=0.125 2023-10-09 18:17:13,577 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2835242.6666666665, ans=0.2 2023-10-09 18:17:21,179 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2835289.3333333335, ans=0.125 2023-10-09 18:17:23,114 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=2835289.3333333335, ans=0.5 2023-10-09 18:17:24,237 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 18:17:28,724 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.550e+02 3.223e+02 3.755e+02 4.885e+02 7.657e+02, threshold=7.509e+02, percent-clipped=0.0 2023-10-09 18:17:39,518 INFO [train.py:1031] (2/4) Epoch 14, batch 22850, loss[loss=0.2163, simple_loss=0.2699, pruned_loss=0.05974, ctc_loss=0.1083, over 16687.00 frames. ], tot_loss[loss=0.2392, simple_loss=0.2925, pruned_loss=0.06883, ctc_loss=0.1205, over 3310022.55 frames. ], batch size: 151, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:17:44,544 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 18:18:01,910 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2835429.3333333335, ans=0.125 2023-10-09 18:18:12,667 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.16 vs. limit=22.5 2023-10-09 18:18:19,466 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.09 vs. limit=15.0 2023-10-09 18:18:24,033 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.83 vs. limit=10.0 2023-10-09 18:18:29,657 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2835569.3333333335, ans=0.0 2023-10-09 18:18:35,956 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2835569.3333333335, ans=0.1 2023-10-09 18:18:37,091 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2835569.3333333335, ans=10.0 2023-10-09 18:18:38,787 INFO [train.py:1031] (2/4) Epoch 14, batch 22900, loss[loss=0.2064, simple_loss=0.2692, pruned_loss=0.05453, ctc_loss=0.08655, over 11134.00 frames. ], tot_loss[loss=0.2365, simple_loss=0.2912, pruned_loss=0.06739, ctc_loss=0.1178, over 3301157.34 frames. ], batch size: 39, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:18:55,027 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2835662.6666666665, ans=0.125 2023-10-09 18:19:03,564 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2835709.3333333335, ans=0.0 2023-10-09 18:19:12,625 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2835709.3333333335, ans=0.2 2023-10-09 18:19:21,054 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 18:19:21,059 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2835756.0, ans=0.125 2023-10-09 18:19:29,094 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.399e+02 3.037e+02 3.390e+02 3.855e+02 5.718e+02, threshold=6.781e+02, percent-clipped=0.0 2023-10-09 18:19:29,555 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2835802.6666666665, ans=0.5 2023-10-09 18:19:40,762 INFO [train.py:1031] (2/4) Epoch 14, batch 22950, loss[loss=0.1907, simple_loss=0.2241, pruned_loss=0.05722, ctc_loss=0.107, over 15460.00 frames. ], tot_loss[loss=0.2335, simple_loss=0.2878, pruned_loss=0.06647, ctc_loss=0.1158, over 3296950.51 frames. ], batch size: 526, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:20:01,970 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2835896.0, ans=0.125 2023-10-09 18:20:16,477 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.60 vs. limit=15.0 2023-10-09 18:20:23,765 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2835989.3333333335, ans=0.95 2023-10-09 18:20:25,388 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2835989.3333333335, ans=0.125 2023-10-09 18:20:42,887 INFO [train.py:1031] (2/4) Epoch 14, batch 23000, loss[loss=0.2793, simple_loss=0.3389, pruned_loss=0.08054, ctc_loss=0.1464, over 16620.00 frames. ], tot_loss[loss=0.2321, simple_loss=0.289, pruned_loss=0.06487, ctc_loss=0.1135, over 3274151.37 frames. ], batch size: 351, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:20:45,365 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2836082.6666666665, ans=0.2 2023-10-09 18:21:11,283 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2836176.0, ans=0.2 2023-10-09 18:21:25,819 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2836222.6666666665, ans=0.0 2023-10-09 18:21:27,016 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2836222.6666666665, ans=0.0 2023-10-09 18:21:36,007 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.314e+02 3.336e+02 3.961e+02 4.906e+02 8.428e+02, threshold=7.922e+02, percent-clipped=4.0 2023-10-09 18:21:41,461 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.whiten.whitening_limit, batch_count=2836269.3333333335, ans=12.0 2023-10-09 18:21:45,235 INFO [train.py:1031] (2/4) Epoch 14, batch 23050, loss[loss=0.2687, simple_loss=0.3191, pruned_loss=0.08338, ctc_loss=0.1291, over 16514.00 frames. ], tot_loss[loss=0.2376, simple_loss=0.294, pruned_loss=0.06716, ctc_loss=0.1175, over 3272149.87 frames. ], batch size: 110, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:22:27,927 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2836456.0, ans=0.125 2023-10-09 18:22:27,964 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2836456.0, ans=0.125 2023-10-09 18:22:47,971 INFO [train.py:1031] (2/4) Epoch 14, batch 23100, loss[loss=0.1985, simple_loss=0.2531, pruned_loss=0.05365, ctc_loss=0.09128, over 16770.00 frames. ], tot_loss[loss=0.232, simple_loss=0.2907, pruned_loss=0.06406, ctc_loss=0.1128, over 3278316.79 frames. ], batch size: 95, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:22:50,478 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.95 vs. limit=10.0 2023-10-09 18:23:01,778 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2836596.0, ans=0.2 2023-10-09 18:23:02,153 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.84 vs. limit=12.0 2023-10-09 18:23:12,991 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.39 vs. limit=22.5 2023-10-09 18:23:20,597 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2836642.6666666665, ans=0.035 2023-10-09 18:23:23,157 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.64 vs. limit=22.5 2023-10-09 18:23:23,708 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 18:23:30,931 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2836689.3333333335, ans=0.0 2023-10-09 18:23:41,038 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.131e+02 2.931e+02 3.346e+02 4.278e+02 6.701e+02, threshold=6.692e+02, percent-clipped=0.0 2023-10-09 18:23:42,469 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2836736.0, ans=0.1 2023-10-09 18:23:50,130 INFO [train.py:1031] (2/4) Epoch 14, batch 23150, loss[loss=0.199, simple_loss=0.2603, pruned_loss=0.05189, ctc_loss=0.08477, over 16956.00 frames. ], tot_loss[loss=0.2275, simple_loss=0.2854, pruned_loss=0.0627, ctc_loss=0.1107, over 3284551.73 frames. ], batch size: 86, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:24:01,604 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2836829.3333333335, ans=0.0 2023-10-09 18:24:11,279 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2836829.3333333335, ans=0.1 2023-10-09 18:24:19,987 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2836876.0, ans=10.0 2023-10-09 18:24:29,169 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2836922.6666666665, ans=0.2 2023-10-09 18:24:33,329 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.04 vs. limit=15.0 2023-10-09 18:24:43,567 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2836969.3333333335, ans=0.125 2023-10-09 18:24:50,636 INFO [train.py:1031] (2/4) Epoch 14, batch 23200, loss[loss=0.2381, simple_loss=0.294, pruned_loss=0.06603, ctc_loss=0.1253, over 16752.00 frames. ], tot_loss[loss=0.2254, simple_loss=0.2818, pruned_loss=0.06243, ctc_loss=0.1103, over 3287161.09 frames. ], batch size: 130, lr: 2.55e-03, grad_scale: 8.0 2023-10-09 18:24:52,993 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.73 vs. limit=15.0 2023-10-09 18:25:02,143 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2837062.6666666665, ans=0.125 2023-10-09 18:25:21,061 INFO [scaling.py:979] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.67 vs. limit=8.0 2023-10-09 18:25:29,283 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2837156.0, ans=0.2 2023-10-09 18:25:29,361 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2837156.0, ans=0.2 2023-10-09 18:25:47,045 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.426e+02 3.050e+02 3.396e+02 3.920e+02 6.096e+02, threshold=6.792e+02, percent-clipped=0.0 2023-10-09 18:25:50,099 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2837202.6666666665, ans=0.035 2023-10-09 18:25:53,633 INFO [train.py:1031] (2/4) Epoch 14, batch 23250, loss[loss=0.1968, simple_loss=0.2523, pruned_loss=0.05233, ctc_loss=0.09168, over 16815.00 frames. ], tot_loss[loss=0.2241, simple_loss=0.281, pruned_loss=0.06177, ctc_loss=0.1091, over 3297528.92 frames. ], batch size: 176, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:26:14,214 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.14 vs. limit=15.0 2023-10-09 18:26:18,719 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2837296.0, ans=0.1 2023-10-09 18:26:42,247 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.06 vs. limit=15.0 2023-10-09 18:26:57,310 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2837436.0, ans=0.0 2023-10-09 18:26:59,151 INFO [train.py:1031] (2/4) Epoch 14, batch 23300, loss[loss=0.2398, simple_loss=0.3088, pruned_loss=0.06195, ctc_loss=0.1173, over 16816.00 frames. ], tot_loss[loss=0.2218, simple_loss=0.2772, pruned_loss=0.06147, ctc_loss=0.1088, over 3298070.66 frames. ], batch size: 309, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:27:06,325 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2837482.6666666665, ans=0.0 2023-10-09 18:27:31,265 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2837576.0, ans=0.0 2023-10-09 18:27:35,962 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2837622.6666666665, ans=0.125 2023-10-09 18:27:44,244 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.15 vs. limit=22.5 2023-10-09 18:27:57,269 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.313e+02 3.134e+02 3.806e+02 4.608e+02 8.711e+02, threshold=7.613e+02, percent-clipped=4.0 2023-10-09 18:28:01,936 INFO [train.py:1031] (2/4) Epoch 14, batch 23350, loss[loss=0.1982, simple_loss=0.2477, pruned_loss=0.056, ctc_loss=0.09184, over 16692.00 frames. ], tot_loss[loss=0.2186, simple_loss=0.2744, pruned_loss=0.06014, ctc_loss=0.1066, over 3300861.33 frames. ], batch size: 201, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:28:11,171 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=2837716.0, ans=0.0 2023-10-09 18:28:18,776 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2837762.6666666665, ans=0.125 2023-10-09 18:28:43,321 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.67 vs. limit=12.0 2023-10-09 18:29:03,090 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2837949.3333333335, ans=0.125 2023-10-09 18:29:03,743 INFO [train.py:1031] (2/4) Epoch 14, batch 23400, loss[loss=0.2176, simple_loss=0.2761, pruned_loss=0.05875, ctc_loss=0.1039, over 16782.00 frames. ], tot_loss[loss=0.2169, simple_loss=0.2711, pruned_loss=0.06018, ctc_loss=0.1061, over 3302208.33 frames. ], batch size: 228, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:29:28,722 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2838042.6666666665, ans=0.1 2023-10-09 18:29:33,492 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.78 vs. limit=6.0 2023-10-09 18:29:41,701 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2838089.3333333335, ans=0.0 2023-10-09 18:29:47,572 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2838089.3333333335, ans=0.0 2023-10-09 18:30:00,414 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.315e+02 3.093e+02 3.637e+02 4.189e+02 1.057e+03, threshold=7.274e+02, percent-clipped=1.0 2023-10-09 18:30:04,496 INFO [train.py:1031] (2/4) Epoch 14, batch 23450, loss[loss=0.2624, simple_loss=0.2742, pruned_loss=0.09351, ctc_loss=0.1589, over 16640.00 frames. ], tot_loss[loss=0.2165, simple_loss=0.2682, pruned_loss=0.06095, ctc_loss=0.1072, over 3306191.18 frames. ], batch size: 386, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:30:35,327 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2838276.0, ans=0.0 2023-10-09 18:30:51,477 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.65 vs. limit=22.5 2023-10-09 18:31:00,010 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=2838369.3333333335, ans=0.05 2023-10-09 18:31:05,809 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2838416.0, ans=0.0 2023-10-09 18:31:06,585 INFO [train.py:1031] (2/4) Epoch 14, batch 23500, loss[loss=0.2099, simple_loss=0.2557, pruned_loss=0.06181, ctc_loss=0.1012, over 16725.00 frames. ], tot_loss[loss=0.2175, simple_loss=0.2678, pruned_loss=0.0619, ctc_loss=0.1086, over 3295048.02 frames. ], batch size: 130, lr: 2.55e-03, grad_scale: 8.0 2023-10-09 18:31:10,042 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2838416.0, ans=0.125 2023-10-09 18:31:32,587 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2838509.3333333335, ans=0.0 2023-10-09 18:31:45,618 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.33 vs. limit=10.0 2023-10-09 18:32:02,996 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2838602.6666666665, ans=0.125 2023-10-09 18:32:05,665 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.576e+02 3.415e+02 3.721e+02 4.306e+02 1.300e+03, threshold=7.442e+02, percent-clipped=1.0 2023-10-09 18:32:08,405 INFO [train.py:1031] (2/4) Epoch 14, batch 23550, loss[loss=0.2195, simple_loss=0.2617, pruned_loss=0.06505, ctc_loss=0.118, over 16822.00 frames. ], tot_loss[loss=0.2223, simple_loss=0.2726, pruned_loss=0.06366, ctc_loss=0.1116, over 3301132.45 frames. ], batch size: 202, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:32:08,814 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2838649.3333333335, ans=0.0 2023-10-09 18:32:11,756 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2838649.3333333335, ans=0.125 2023-10-09 18:32:11,774 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2838649.3333333335, ans=0.04949747468305833 2023-10-09 18:32:19,125 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2838696.0, ans=0.0 2023-10-09 18:32:42,893 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2838742.6666666665, ans=0.125 2023-10-09 18:33:05,841 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2838836.0, ans=0.125 2023-10-09 18:33:07,017 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2838836.0, ans=0.0 2023-10-09 18:33:07,043 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2838836.0, ans=0.2 2023-10-09 18:33:08,885 INFO [train.py:1031] (2/4) Epoch 14, batch 23600, loss[loss=0.1921, simple_loss=0.2389, pruned_loss=0.05302, ctc_loss=0.09818, over 16759.00 frames. ], tot_loss[loss=0.2202, simple_loss=0.2697, pruned_loss=0.06319, ctc_loss=0.1107, over 3302778.92 frames. ], batch size: 151, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:33:16,311 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2838882.6666666665, ans=0.0 2023-10-09 18:33:24,271 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2838929.3333333335, ans=0.125 2023-10-09 18:33:32,873 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2838976.0, ans=0.1 2023-10-09 18:33:57,218 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2839069.3333333335, ans=0.2 2023-10-09 18:33:58,379 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2839069.3333333335, ans=0.0 2023-10-09 18:34:01,099 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2839069.3333333335, ans=0.0 2023-10-09 18:34:09,459 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.368e+02 2.978e+02 3.333e+02 3.973e+02 8.640e+02, threshold=6.667e+02, percent-clipped=1.0 2023-10-09 18:34:10,553 INFO [train.py:1031] (2/4) Epoch 14, batch 23650, loss[loss=0.2196, simple_loss=0.2696, pruned_loss=0.06419, ctc_loss=0.103, over 16730.00 frames. ], tot_loss[loss=0.2199, simple_loss=0.2717, pruned_loss=0.06229, ctc_loss=0.1089, over 3299942.10 frames. ], batch size: 130, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:34:17,851 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2839116.0, ans=0.05 2023-10-09 18:34:23,543 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2839162.6666666665, ans=0.125 2023-10-09 18:34:24,608 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2839162.6666666665, ans=0.07 2023-10-09 18:34:25,696 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 18:34:41,724 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 18:35:11,976 INFO [train.py:1031] (2/4) Epoch 14, batch 23700, loss[loss=0.1992, simple_loss=0.2582, pruned_loss=0.05279, ctc_loss=0.08655, over 16825.00 frames. ], tot_loss[loss=0.2147, simple_loss=0.2715, pruned_loss=0.05839, ctc_loss=0.1027, over 3294653.78 frames. ], batch size: 121, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:35:16,614 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2839349.3333333335, ans=0.0 2023-10-09 18:35:29,085 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2839396.0, ans=0.05 2023-10-09 18:35:46,491 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2839489.3333333335, ans=0.2 2023-10-09 18:35:49,738 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2839489.3333333335, ans=0.125 2023-10-09 18:35:51,197 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2839489.3333333335, ans=0.1 2023-10-09 18:36:00,776 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2839536.0, ans=0.125 2023-10-09 18:36:11,208 INFO [train.py:1031] (2/4) Epoch 14, batch 23750, loss[loss=0.2248, simple_loss=0.3085, pruned_loss=0.05039, ctc_loss=0.1009, over 16806.00 frames. ], tot_loss[loss=0.2168, simple_loss=0.2753, pruned_loss=0.05851, ctc_loss=0.1034, over 3293520.95 frames. ], batch size: 328, lr: 2.55e-03, grad_scale: 1.0 2023-10-09 18:36:12,957 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+02 2.785e+02 3.356e+02 4.379e+02 6.615e+02, threshold=6.712e+02, percent-clipped=0.0 2023-10-09 18:36:16,325 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2839582.6666666665, ans=0.125 2023-10-09 18:36:16,407 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=2839582.6666666665, ans=0.95 2023-10-09 18:36:18,337 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2839582.6666666665, ans=10.0 2023-10-09 18:36:34,109 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.06 vs. limit=12.0 2023-10-09 18:36:46,657 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2839722.6666666665, ans=0.0 2023-10-09 18:36:48,880 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=2839722.6666666665, ans=0.2 2023-10-09 18:36:56,885 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2839722.6666666665, ans=0.1 2023-10-09 18:37:11,013 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2839816.0, ans=0.1 2023-10-09 18:37:11,436 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.35 vs. limit=15.0 2023-10-09 18:37:11,735 INFO [train.py:1031] (2/4) Epoch 14, batch 23800, loss[loss=0.2011, simple_loss=0.2768, pruned_loss=0.04589, ctc_loss=0.08379, over 16826.00 frames. ], tot_loss[loss=0.213, simple_loss=0.2755, pruned_loss=0.05554, ctc_loss=0.09881, over 3297971.48 frames. ], batch size: 215, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:37:40,793 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.25 vs. limit=15.0 2023-10-09 18:37:45,420 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.89 vs. limit=22.5 2023-10-09 18:37:55,016 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2839956.0, ans=0.5 2023-10-09 18:38:12,933 INFO [train.py:1031] (2/4) Epoch 14, batch 23850, loss[loss=0.2107, simple_loss=0.2783, pruned_loss=0.05259, ctc_loss=0.09463, over 16806.00 frames. ], tot_loss[loss=0.2173, simple_loss=0.2825, pruned_loss=0.05601, ctc_loss=0.1001, over 3305995.40 frames. ], batch size: 102, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:38:14,610 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.006e+02 3.207e+02 4.081e+02 4.991e+02 8.849e+02, threshold=8.163e+02, percent-clipped=8.0 2023-10-09 18:38:30,188 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2840096.0, ans=0.2 2023-10-09 18:38:33,824 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2840096.0, ans=0.125 2023-10-09 18:38:41,516 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.93 vs. limit=15.0 2023-10-09 18:38:47,424 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2840142.6666666665, ans=0.1 2023-10-09 18:38:47,438 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2840142.6666666665, ans=0.1 2023-10-09 18:38:51,350 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2840189.3333333335, ans=0.1 2023-10-09 18:39:13,691 INFO [train.py:1031] (2/4) Epoch 14, batch 23900, loss[loss=0.2666, simple_loss=0.2996, pruned_loss=0.08564, ctc_loss=0.1555, over 16843.00 frames. ], tot_loss[loss=0.2217, simple_loss=0.2852, pruned_loss=0.05824, ctc_loss=0.1042, over 3313341.90 frames. ], batch size: 384, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:39:23,236 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2840282.6666666665, ans=0.0 2023-10-09 18:39:29,517 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.03 vs. limit=15.0 2023-10-09 18:39:33,724 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2840329.3333333335, ans=0.04949747468305833 2023-10-09 18:39:50,710 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.25 vs. limit=15.0 2023-10-09 18:40:15,815 INFO [train.py:1031] (2/4) Epoch 14, batch 23950, loss[loss=0.2114, simple_loss=0.2356, pruned_loss=0.06867, ctc_loss=0.1246, over 15376.00 frames. ], tot_loss[loss=0.2232, simple_loss=0.2832, pruned_loss=0.06022, ctc_loss=0.1069, over 3311969.16 frames. ], batch size: 526, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:40:16,059 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2840516.0, ans=0.125 2023-10-09 18:40:16,838 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.526e+02 3.283e+02 3.829e+02 4.670e+02 8.731e+02, threshold=7.659e+02, percent-clipped=1.0 2023-10-09 18:40:29,207 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2840562.6666666665, ans=0.0 2023-10-09 18:40:38,623 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2840609.3333333335, ans=0.125 2023-10-09 18:40:47,331 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2840609.3333333335, ans=0.125 2023-10-09 18:41:02,771 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2840702.6666666665, ans=0.1 2023-10-09 18:41:14,518 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2840749.3333333335, ans=0.2 2023-10-09 18:41:15,727 INFO [train.py:1031] (2/4) Epoch 14, batch 24000, loss[loss=0.2277, simple_loss=0.2951, pruned_loss=0.05901, ctc_loss=0.1056, over 16944.00 frames. ], tot_loss[loss=0.2252, simple_loss=0.2826, pruned_loss=0.062, ctc_loss=0.1096, over 3305237.94 frames. ], batch size: 228, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:41:15,728 INFO [train.py:1054] (2/4) Computing validation loss 2023-10-09 18:41:33,416 INFO [train.py:1063] (2/4) Epoch 14, validation: loss=0.2354, simple_loss=0.3014, pruned_loss=0.06541, ctc_loss=0.09632, over 1796401.00 frames. 2023-10-09 18:41:33,417 INFO [train.py:1064] (2/4) Maximum memory allocated so far is 14591MB 2023-10-09 18:41:48,495 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2840796.0, ans=0.1 2023-10-09 18:42:09,014 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.88 vs. limit=6.0 2023-10-09 18:42:28,008 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2840936.0, ans=0.125 2023-10-09 18:42:36,292 INFO [train.py:1031] (2/4) Epoch 14, batch 24050, loss[loss=0.2798, simple_loss=0.3267, pruned_loss=0.08601, ctc_loss=0.1521, over 16709.00 frames. ], tot_loss[loss=0.2282, simple_loss=0.2871, pruned_loss=0.06246, ctc_loss=0.1107, over 3300356.75 frames. ], batch size: 272, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:42:40,008 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.406e+02 3.196e+02 3.829e+02 4.589e+02 8.519e+02, threshold=7.658e+02, percent-clipped=2.0 2023-10-09 18:43:17,300 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.45 vs. limit=15.0 2023-10-09 18:43:21,460 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2841122.6666666665, ans=0.1 2023-10-09 18:43:24,211 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2841122.6666666665, ans=0.125 2023-10-09 18:43:24,423 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.45 vs. limit=12.0 2023-10-09 18:43:28,577 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2841169.3333333335, ans=0.125 2023-10-09 18:43:37,901 INFO [train.py:1031] (2/4) Epoch 14, batch 24100, loss[loss=0.2113, simple_loss=0.2788, pruned_loss=0.05343, ctc_loss=0.09224, over 16658.00 frames. ], tot_loss[loss=0.2341, simple_loss=0.2912, pruned_loss=0.06533, ctc_loss=0.116, over 3305583.10 frames. ], batch size: 102, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:43:41,978 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2841216.0, ans=0.0 2023-10-09 18:43:48,101 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2841216.0, ans=0.0 2023-10-09 18:43:50,799 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2841262.6666666665, ans=0.125 2023-10-09 18:44:16,303 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2841356.0, ans=0.125 2023-10-09 18:44:20,174 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2841356.0, ans=0.125 2023-10-09 18:44:36,541 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=2841402.6666666665, ans=10.0 2023-10-09 18:44:37,617 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2841402.6666666665, ans=0.125 2023-10-09 18:44:39,359 INFO [train.py:1031] (2/4) Epoch 14, batch 24150, loss[loss=0.2164, simple_loss=0.2712, pruned_loss=0.06084, ctc_loss=0.09979, over 16956.00 frames. ], tot_loss[loss=0.2255, simple_loss=0.2839, pruned_loss=0.0616, ctc_loss=0.1096, over 3298587.90 frames. ], batch size: 78, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:44:43,201 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.229e+02 3.029e+02 3.494e+02 3.950e+02 7.485e+02, threshold=6.988e+02, percent-clipped=0.0 2023-10-09 18:45:14,928 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2841542.6666666665, ans=0.0 2023-10-09 18:45:17,054 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2841589.3333333335, ans=0.2 2023-10-09 18:45:22,932 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2841589.3333333335, ans=0.125 2023-10-09 18:45:32,662 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2841636.0, ans=0.2 2023-10-09 18:45:42,160 INFO [train.py:1031] (2/4) Epoch 14, batch 24200, loss[loss=0.1718, simple_loss=0.2342, pruned_loss=0.04022, ctc_loss=0.07263, over 16784.00 frames. ], tot_loss[loss=0.2197, simple_loss=0.2808, pruned_loss=0.05839, ctc_loss=0.1043, over 3301288.20 frames. ], batch size: 95, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:45:42,457 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2841682.6666666665, ans=0.1 2023-10-09 18:45:45,759 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2841682.6666666665, ans=0.125 2023-10-09 18:45:45,783 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2841682.6666666665, ans=0.2 2023-10-09 18:45:45,811 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2841682.6666666665, ans=0.125 2023-10-09 18:45:50,653 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2841682.6666666665, ans=0.0 2023-10-09 18:46:05,151 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2841776.0, ans=0.1 2023-10-09 18:46:10,582 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2841776.0, ans=0.04949747468305833 2023-10-09 18:46:19,554 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2841822.6666666665, ans=0.1 2023-10-09 18:46:35,827 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.60 vs. limit=15.0 2023-10-09 18:46:43,490 INFO [train.py:1031] (2/4) Epoch 14, batch 24250, loss[loss=0.2181, simple_loss=0.2673, pruned_loss=0.06333, ctc_loss=0.1056, over 16743.00 frames. ], tot_loss[loss=0.22, simple_loss=0.2795, pruned_loss=0.05914, ctc_loss=0.1053, over 3298035.60 frames. ], batch size: 151, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:46:48,054 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2841916.0, ans=0.125 2023-10-09 18:46:49,469 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.179e+02 2.981e+02 3.499e+02 4.269e+02 8.354e+02, threshold=6.999e+02, percent-clipped=3.0 2023-10-09 18:47:23,499 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2842056.0, ans=0.125 2023-10-09 18:47:46,789 INFO [train.py:1031] (2/4) Epoch 14, batch 24300, loss[loss=0.2277, simple_loss=0.3046, pruned_loss=0.05566, ctc_loss=0.09853, over 16922.00 frames. ], tot_loss[loss=0.2274, simple_loss=0.2849, pruned_loss=0.06264, ctc_loss=0.1113, over 3301443.75 frames. ], batch size: 292, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:48:11,847 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.48 vs. limit=10.0 2023-10-09 18:48:42,714 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2842336.0, ans=0.125 2023-10-09 18:48:48,965 INFO [train.py:1031] (2/4) Epoch 14, batch 24350, loss[loss=0.2212, simple_loss=0.2691, pruned_loss=0.06406, ctc_loss=0.1129, over 16753.00 frames. ], tot_loss[loss=0.2287, simple_loss=0.2867, pruned_loss=0.06297, ctc_loss=0.112, over 3295006.15 frames. ], batch size: 151, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:48:55,800 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.629e+02 3.451e+02 4.035e+02 4.756e+02 1.145e+03, threshold=8.070e+02, percent-clipped=2.0 2023-10-09 18:48:58,181 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2842382.6666666665, ans=0.0 2023-10-09 18:49:22,284 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2842476.0, ans=0.125 2023-10-09 18:49:48,145 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.15 vs. limit=15.0 2023-10-09 18:49:49,180 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2842616.0, ans=0.04949747468305833 2023-10-09 18:49:49,954 INFO [train.py:1031] (2/4) Epoch 14, batch 24400, loss[loss=0.2152, simple_loss=0.2811, pruned_loss=0.05468, ctc_loss=0.09974, over 16948.00 frames. ], tot_loss[loss=0.2297, simple_loss=0.2858, pruned_loss=0.06407, ctc_loss=0.1137, over 3303597.61 frames. ], batch size: 243, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:49:53,319 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2842616.0, ans=0.125 2023-10-09 18:50:10,240 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.33 vs. limit=15.0 2023-10-09 18:50:27,681 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2842756.0, ans=0.1 2023-10-09 18:50:50,581 INFO [train.py:1031] (2/4) Epoch 14, batch 24450, loss[loss=0.2597, simple_loss=0.2988, pruned_loss=0.08082, ctc_loss=0.1473, over 16890.00 frames. ], tot_loss[loss=0.2307, simple_loss=0.2855, pruned_loss=0.06493, ctc_loss=0.1149, over 3294801.26 frames. ], batch size: 328, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:50:57,484 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.565e+02 3.474e+02 3.798e+02 4.507e+02 6.680e+02, threshold=7.596e+02, percent-clipped=0.0 2023-10-09 18:51:14,130 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2842942.6666666665, ans=0.0 2023-10-09 18:51:15,114 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2842942.6666666665, ans=0.125 2023-10-09 18:51:21,003 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2842942.6666666665, ans=0.125 2023-10-09 18:51:21,375 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.12 vs. limit=10.0 2023-10-09 18:51:29,199 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.56 vs. limit=22.5 2023-10-09 18:51:34,930 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.42 vs. limit=15.0 2023-10-09 18:51:39,562 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2843036.0, ans=0.0 2023-10-09 18:51:51,708 INFO [train.py:1031] (2/4) Epoch 14, batch 24500, loss[loss=0.2144, simple_loss=0.2551, pruned_loss=0.06574, ctc_loss=0.1055, over 16828.00 frames. ], tot_loss[loss=0.2301, simple_loss=0.2846, pruned_loss=0.06502, ctc_loss=0.1138, over 3299331.73 frames. ], batch size: 164, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:52:00,581 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2843082.6666666665, ans=0.125 2023-10-09 18:52:05,583 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2843129.3333333335, ans=0.0 2023-10-09 18:52:08,091 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2843129.3333333335, ans=0.125 2023-10-09 18:52:21,994 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2843176.0, ans=0.125 2023-10-09 18:52:22,054 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2843176.0, ans=0.125 2023-10-09 18:52:29,715 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=2843222.6666666665, ans=0.05 2023-10-09 18:52:36,451 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.15 vs. limit=15.0 2023-10-09 18:52:37,219 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2843222.6666666665, ans=0.125 2023-10-09 18:52:54,588 INFO [train.py:1031] (2/4) Epoch 14, batch 24550, loss[loss=0.2083, simple_loss=0.284, pruned_loss=0.04821, ctc_loss=0.09049, over 16961.00 frames. ], tot_loss[loss=0.2261, simple_loss=0.2826, pruned_loss=0.06293, ctc_loss=0.1096, over 3306336.27 frames. ], batch size: 258, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:53:03,099 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.233e+02 3.410e+02 4.193e+02 5.169e+02 8.028e+02, threshold=8.385e+02, percent-clipped=3.0 2023-10-09 18:53:27,381 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2843409.3333333335, ans=0.0 2023-10-09 18:53:36,945 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2843456.0, ans=0.125 2023-10-09 18:53:45,606 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2843502.6666666665, ans=0.5 2023-10-09 18:53:47,659 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.33 vs. limit=12.0 2023-10-09 18:53:57,853 INFO [train.py:1031] (2/4) Epoch 14, batch 24600, loss[loss=0.3438, simple_loss=0.3646, pruned_loss=0.1184, ctc_loss=0.2156, over 16716.00 frames. ], tot_loss[loss=0.227, simple_loss=0.2848, pruned_loss=0.06264, ctc_loss=0.1096, over 3302207.10 frames. ], batch size: 384, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:54:05,428 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2843549.3333333335, ans=0.2 2023-10-09 18:54:12,104 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2843596.0, ans=0.2 2023-10-09 18:54:23,707 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2843642.6666666665, ans=0.125 2023-10-09 18:54:35,525 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2843689.3333333335, ans=0.125 2023-10-09 18:54:41,801 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2843689.3333333335, ans=0.07 2023-10-09 18:54:43,387 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2843689.3333333335, ans=0.125 2023-10-09 18:54:53,707 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2843736.0, ans=0.1 2023-10-09 18:55:02,710 INFO [train.py:1031] (2/4) Epoch 14, batch 24650, loss[loss=0.2369, simple_loss=0.3063, pruned_loss=0.06341, ctc_loss=0.1019, over 16781.00 frames. ], tot_loss[loss=0.2347, simple_loss=0.293, pruned_loss=0.06525, ctc_loss=0.1148, over 3301614.08 frames. ], batch size: 95, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:55:05,769 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.43 vs. limit=15.0 2023-10-09 18:55:13,665 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.456e+02 3.365e+02 3.995e+02 4.722e+02 9.808e+02, threshold=7.989e+02, percent-clipped=0.0 2023-10-09 18:55:14,089 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2843829.3333333335, ans=0.125 2023-10-09 18:55:16,895 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2843829.3333333335, ans=0.0 2023-10-09 18:55:19,719 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2843829.3333333335, ans=0.125 2023-10-09 18:55:31,051 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2843876.0, ans=0.0 2023-10-09 18:55:51,904 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=2843922.6666666665, ans=0.05 2023-10-09 18:56:05,668 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.77 vs. limit=15.0 2023-10-09 18:56:06,107 INFO [train.py:1031] (2/4) Epoch 14, batch 24700, loss[loss=0.2233, simple_loss=0.2803, pruned_loss=0.06304, ctc_loss=0.1007, over 11116.00 frames. ], tot_loss[loss=0.239, simple_loss=0.3001, pruned_loss=0.06583, ctc_loss=0.1156, over 3288665.61 frames. ], batch size: 36, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:56:18,675 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2844062.6666666665, ans=0.125 2023-10-09 18:56:23,596 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2844062.6666666665, ans=0.125 2023-10-09 18:56:29,833 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2844062.6666666665, ans=0.125 2023-10-09 18:56:43,060 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2844109.3333333335, ans=0.1 2023-10-09 18:56:50,832 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2844156.0, ans=0.125 2023-10-09 18:56:52,925 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2844156.0, ans=0.125 2023-10-09 18:56:55,172 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2844156.0, ans=0.025 2023-10-09 18:57:00,815 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=2844202.6666666665, ans=0.0 2023-10-09 18:57:00,846 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=2844202.6666666665, ans=0.1 2023-10-09 18:57:09,411 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.82 vs. limit=15.0 2023-10-09 18:57:10,470 INFO [train.py:1031] (2/4) Epoch 14, batch 24750, loss[loss=0.2492, simple_loss=0.3326, pruned_loss=0.06095, ctc_loss=0.1098, over 16275.00 frames. ], tot_loss[loss=0.2449, simple_loss=0.304, pruned_loss=0.06875, ctc_loss=0.1205, over 3293916.48 frames. ], batch size: 463, lr: 2.55e-03, grad_scale: 2.0 2023-10-09 18:57:18,228 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2844249.3333333335, ans=0.0 2023-10-09 18:57:22,865 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2844296.0, ans=0.2 2023-10-09 18:57:23,591 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.844e+02 3.622e+02 4.141e+02 4.992e+02 1.091e+03, threshold=8.281e+02, percent-clipped=4.0 2023-10-09 18:57:26,922 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.83 vs. limit=15.0 2023-10-09 18:57:27,137 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.59 vs. limit=15.0 2023-10-09 18:57:49,627 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2844389.3333333335, ans=0.125 2023-10-09 18:57:53,174 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.52 vs. limit=12.0 2023-10-09 18:58:06,796 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=2844436.0, ans=6.0 2023-10-09 18:58:11,535 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.58 vs. limit=15.0 2023-10-09 18:58:17,275 INFO [train.py:1031] (2/4) Epoch 14, batch 24800, loss[loss=0.2127, simple_loss=0.2422, pruned_loss=0.06975, ctc_loss=0.1096, over 16787.00 frames. ], tot_loss[loss=0.2436, simple_loss=0.3029, pruned_loss=0.06844, ctc_loss=0.1188, over 3292962.85 frames. ], batch size: 121, lr: 2.55e-03, grad_scale: 4.0 2023-10-09 18:58:34,735 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2844529.3333333335, ans=0.125 2023-10-09 18:59:15,577 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2844669.3333333335, ans=0.125 2023-10-09 18:59:20,851 INFO [train.py:1031] (2/4) Epoch 14, batch 24850, loss[loss=0.2438, simple_loss=0.2846, pruned_loss=0.07626, ctc_loss=0.1263, over 16715.00 frames. ], tot_loss[loss=0.2444, simple_loss=0.3023, pruned_loss=0.06936, ctc_loss=0.1194, over 3296297.22 frames. ], batch size: 151, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 18:59:23,040 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 18:59:29,813 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2844716.0, ans=0.2 2023-10-09 18:59:35,025 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.490e+02 3.266e+02 3.931e+02 4.617e+02 8.041e+02, threshold=7.862e+02, percent-clipped=0.0 2023-10-09 18:59:59,304 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2844856.0, ans=0.2 2023-10-09 19:00:02,080 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2844856.0, ans=0.125 2023-10-09 19:00:23,745 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2844902.6666666665, ans=0.0 2023-10-09 19:00:27,329 INFO [train.py:1031] (2/4) Epoch 14, batch 24900, loss[loss=0.2219, simple_loss=0.2642, pruned_loss=0.06751, ctc_loss=0.1114, over 16878.00 frames. ], tot_loss[loss=0.247, simple_loss=0.3052, pruned_loss=0.07015, ctc_loss=0.121, over 3295298.59 frames. ], batch size: 90, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:00:29,994 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.23 vs. limit=22.5 2023-10-09 19:01:00,209 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.09 vs. limit=15.0 2023-10-09 19:01:17,069 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.86 vs. limit=15.0 2023-10-09 19:01:26,810 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2845136.0, ans=0.125 2023-10-09 19:01:28,806 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2845136.0, ans=0.125 2023-10-09 19:01:30,761 INFO [train.py:1031] (2/4) Epoch 14, batch 24950, loss[loss=0.2125, simple_loss=0.264, pruned_loss=0.06088, ctc_loss=0.09808, over 16783.00 frames. ], tot_loss[loss=0.247, simple_loss=0.3075, pruned_loss=0.06927, ctc_loss=0.1202, over 3291611.22 frames. ], batch size: 164, lr: 2.54e-03, grad_scale: 1.0 2023-10-09 19:01:46,521 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.516e+02 3.532e+02 4.120e+02 4.965e+02 9.701e+02, threshold=8.240e+02, percent-clipped=4.0 2023-10-09 19:01:58,671 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2845276.0, ans=0.05 2023-10-09 19:02:00,777 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2845276.0, ans=0.125 2023-10-09 19:02:12,621 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 19:02:13,742 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2845322.6666666665, ans=0.2 2023-10-09 19:02:24,607 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=2845369.3333333335, ans=15.0 2023-10-09 19:02:32,820 INFO [train.py:1031] (2/4) Epoch 14, batch 25000, loss[loss=0.232, simple_loss=0.2805, pruned_loss=0.06954, ctc_loss=0.111, over 16859.00 frames. ], tot_loss[loss=0.2439, simple_loss=0.3027, pruned_loss=0.06871, ctc_loss=0.1193, over 3291523.99 frames. ], batch size: 121, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:02:42,638 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2845416.0, ans=0.1 2023-10-09 19:02:45,791 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2845462.6666666665, ans=0.125 2023-10-09 19:02:59,844 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2845509.3333333335, ans=0.125 2023-10-09 19:03:33,088 INFO [train.py:1031] (2/4) Epoch 14, batch 25050, loss[loss=0.2317, simple_loss=0.2854, pruned_loss=0.06691, ctc_loss=0.1103, over 16792.00 frames. ], tot_loss[loss=0.2419, simple_loss=0.2987, pruned_loss=0.06871, ctc_loss=0.119, over 3294148.08 frames. ], batch size: 95, lr: 2.54e-03, grad_scale: 1.0 2023-10-09 19:03:37,258 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2845649.3333333335, ans=0.125 2023-10-09 19:03:41,024 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2845649.3333333335, ans=0.125 2023-10-09 19:03:47,196 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2845696.0, ans=0.125 2023-10-09 19:03:50,011 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.415e+02 3.393e+02 3.859e+02 4.552e+02 1.527e+03, threshold=7.717e+02, percent-clipped=2.0 2023-10-09 19:03:59,851 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.41 vs. limit=15.0 2023-10-09 19:04:06,410 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2845742.6666666665, ans=0.0 2023-10-09 19:04:19,114 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2845789.3333333335, ans=0.125 2023-10-09 19:04:34,838 INFO [train.py:1031] (2/4) Epoch 14, batch 25100, loss[loss=0.2108, simple_loss=0.2546, pruned_loss=0.06165, ctc_loss=0.1092, over 16743.00 frames. ], tot_loss[loss=0.2366, simple_loss=0.2941, pruned_loss=0.06643, ctc_loss=0.1156, over 3291345.05 frames. ], batch size: 121, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:04:37,825 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2845882.6666666665, ans=0.125 2023-10-09 19:04:38,696 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2845882.6666666665, ans=0.0 2023-10-09 19:04:39,778 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2845882.6666666665, ans=0.1 2023-10-09 19:04:47,964 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2845929.3333333335, ans=0.125 2023-10-09 19:04:53,840 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2845929.3333333335, ans=0.07 2023-10-09 19:04:53,925 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2845929.3333333335, ans=0.0 2023-10-09 19:05:01,467 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2845976.0, ans=0.125 2023-10-09 19:05:16,766 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2846022.6666666665, ans=0.125 2023-10-09 19:05:22,324 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2846022.6666666665, ans=0.125 2023-10-09 19:05:36,319 INFO [train.py:1031] (2/4) Epoch 14, batch 25150, loss[loss=0.196, simple_loss=0.2578, pruned_loss=0.05055, ctc_loss=0.08301, over 16800.00 frames. ], tot_loss[loss=0.2305, simple_loss=0.2868, pruned_loss=0.06455, ctc_loss=0.1126, over 3297414.71 frames. ], batch size: 164, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:05:52,020 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.349e+02 2.975e+02 3.474e+02 4.105e+02 7.010e+02, threshold=6.948e+02, percent-clipped=0.0 2023-10-09 19:06:36,059 INFO [train.py:1031] (2/4) Epoch 14, batch 25200, loss[loss=0.2245, simple_loss=0.2703, pruned_loss=0.06623, ctc_loss=0.1158, over 16646.00 frames. ], tot_loss[loss=0.2296, simple_loss=0.2842, pruned_loss=0.06486, ctc_loss=0.1133, over 3300459.58 frames. ], batch size: 351, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:07:15,965 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2846489.3333333335, ans=0.1 2023-10-09 19:07:31,798 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.25 vs. limit=15.0 2023-10-09 19:07:35,929 INFO [train.py:1031] (2/4) Epoch 14, batch 25250, loss[loss=0.2988, simple_loss=0.3122, pruned_loss=0.1056, ctc_loss=0.1855, over 16869.00 frames. ], tot_loss[loss=0.2304, simple_loss=0.2831, pruned_loss=0.0658, ctc_loss=0.1151, over 3308062.47 frames. ], batch size: 384, lr: 2.54e-03, grad_scale: 1.0 2023-10-09 19:07:56,502 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.603e+02 3.269e+02 3.734e+02 4.463e+02 8.122e+02, threshold=7.469e+02, percent-clipped=1.0 2023-10-09 19:08:10,822 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2846676.0, ans=0.125 2023-10-09 19:08:26,995 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=2846769.3333333335, ans=0.02 2023-10-09 19:08:33,821 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2846769.3333333335, ans=0.125 2023-10-09 19:08:37,084 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2846769.3333333335, ans=0.125 2023-10-09 19:08:39,562 INFO [train.py:1031] (2/4) Epoch 14, batch 25300, loss[loss=0.2927, simple_loss=0.3607, pruned_loss=0.08328, ctc_loss=0.1455, over 16888.00 frames. ], tot_loss[loss=0.2367, simple_loss=0.2895, pruned_loss=0.06804, ctc_loss=0.1193, over 3311079.18 frames. ], batch size: 242, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:09:20,628 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2846956.0, ans=0.125 2023-10-09 19:09:23,381 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2846956.0, ans=0.0 2023-10-09 19:09:26,684 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2846956.0, ans=0.0 2023-10-09 19:09:30,383 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2847002.6666666665, ans=0.125 2023-10-09 19:09:33,059 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2847002.6666666665, ans=0.025 2023-10-09 19:09:41,237 INFO [train.py:1031] (2/4) Epoch 14, batch 25350, loss[loss=0.2441, simple_loss=0.3002, pruned_loss=0.06878, ctc_loss=0.1261, over 15329.00 frames. ], tot_loss[loss=0.242, simple_loss=0.2958, pruned_loss=0.06961, ctc_loss=0.1225, over 3308550.10 frames. ], batch size: 526, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:09:50,227 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2847049.3333333335, ans=0.125 2023-10-09 19:09:56,068 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2847096.0, ans=0.0 2023-10-09 19:10:01,553 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.611e+02 3.479e+02 4.151e+02 5.048e+02 8.470e+02, threshold=8.302e+02, percent-clipped=4.0 2023-10-09 19:10:03,386 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2847096.0, ans=0.125 2023-10-09 19:10:25,495 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2847189.3333333335, ans=0.1 2023-10-09 19:10:28,642 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2847236.0, ans=0.0 2023-10-09 19:10:33,727 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.18 vs. limit=22.5 2023-10-09 19:10:41,702 INFO [train.py:1031] (2/4) Epoch 14, batch 25400, loss[loss=0.216, simple_loss=0.2718, pruned_loss=0.05893, ctc_loss=0.1059, over 16726.00 frames. ], tot_loss[loss=0.2409, simple_loss=0.2931, pruned_loss=0.06988, ctc_loss=0.1224, over 3299101.35 frames. ], batch size: 151, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:10:43,557 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=2847282.6666666665, ans=15.0 2023-10-09 19:10:54,680 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.48 vs. limit=12.0 2023-10-09 19:11:04,307 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2847376.0, ans=0.0 2023-10-09 19:11:13,509 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2847376.0, ans=0.0 2023-10-09 19:11:28,878 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2847469.3333333335, ans=0.0 2023-10-09 19:11:35,242 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=2847469.3333333335, ans=0.05 2023-10-09 19:11:40,829 INFO [train.py:1031] (2/4) Epoch 14, batch 25450, loss[loss=0.204, simple_loss=0.2523, pruned_loss=0.05679, ctc_loss=0.1052, over 16795.00 frames. ], tot_loss[loss=0.2393, simple_loss=0.2901, pruned_loss=0.0698, ctc_loss=0.122, over 3303472.70 frames. ], batch size: 188, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:11:41,646 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.98 vs. limit=15.0 2023-10-09 19:11:42,196 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2847516.0, ans=0.035 2023-10-09 19:11:47,018 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2847516.0, ans=0.125 2023-10-09 19:12:01,164 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.595e+02 3.142e+02 3.636e+02 4.300e+02 1.054e+03, threshold=7.273e+02, percent-clipped=3.0 2023-10-09 19:12:09,754 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2847609.3333333335, ans=0.1 2023-10-09 19:12:23,408 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.29 vs. limit=10.0 2023-10-09 19:12:24,191 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2847656.0, ans=0.125 2023-10-09 19:12:27,466 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2847656.0, ans=0.0 2023-10-09 19:12:32,356 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2847702.6666666665, ans=0.04949747468305833 2023-10-09 19:12:34,106 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2847702.6666666665, ans=0.09899494936611666 2023-10-09 19:12:41,753 INFO [train.py:1031] (2/4) Epoch 14, batch 25500, loss[loss=0.2235, simple_loss=0.2721, pruned_loss=0.06424, ctc_loss=0.116, over 15283.00 frames. ], tot_loss[loss=0.2361, simple_loss=0.2877, pruned_loss=0.06827, ctc_loss=0.1199, over 3301085.31 frames. ], batch size: 526, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:13:00,700 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2847796.0, ans=0.0 2023-10-09 19:13:08,279 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2847842.6666666665, ans=0.0 2023-10-09 19:13:12,900 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.48 vs. limit=22.5 2023-10-09 19:13:19,865 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2847889.3333333335, ans=0.0 2023-10-09 19:13:29,658 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2847889.3333333335, ans=0.2 2023-10-09 19:13:30,765 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2847936.0, ans=0.125 2023-10-09 19:13:31,738 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2847936.0, ans=0.0 2023-10-09 19:13:32,734 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2847936.0, ans=0.05 2023-10-09 19:13:34,204 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.79 vs. limit=15.0 2023-10-09 19:13:34,938 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2847936.0, ans=0.0 2023-10-09 19:13:34,946 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2847936.0, ans=0.1 2023-10-09 19:13:44,904 INFO [train.py:1031] (2/4) Epoch 14, batch 25550, loss[loss=0.2286, simple_loss=0.2656, pruned_loss=0.07184, ctc_loss=0.1197, over 16473.00 frames. ], tot_loss[loss=0.2387, simple_loss=0.2893, pruned_loss=0.0696, ctc_loss=0.1223, over 3295485.88 frames. ], batch size: 70, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:14:07,199 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.382e+02 3.271e+02 3.768e+02 4.486e+02 1.096e+03, threshold=7.537e+02, percent-clipped=1.0 2023-10-09 19:14:08,687 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2848076.0, ans=0.1 2023-10-09 19:14:12,433 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=15.0 2023-10-09 19:14:16,406 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2848076.0, ans=0.0 2023-10-09 19:14:25,733 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2848122.6666666665, ans=0.0 2023-10-09 19:14:36,275 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2848169.3333333335, ans=0.1 2023-10-09 19:14:40,603 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2848169.3333333335, ans=0.04949747468305833 2023-10-09 19:14:45,704 INFO [train.py:1031] (2/4) Epoch 14, batch 25600, loss[loss=0.2472, simple_loss=0.2946, pruned_loss=0.07377, ctc_loss=0.1309, over 16203.00 frames. ], tot_loss[loss=0.2427, simple_loss=0.2931, pruned_loss=0.07111, ctc_loss=0.1251, over 3298370.14 frames. ], batch size: 463, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:15:06,824 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2848262.6666666665, ans=0.0 2023-10-09 19:15:11,271 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2848309.3333333335, ans=0.125 2023-10-09 19:15:24,024 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2848356.0, ans=0.0 2023-10-09 19:15:24,176 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.67 vs. limit=15.0 2023-10-09 19:15:47,650 INFO [train.py:1031] (2/4) Epoch 14, batch 25650, loss[loss=0.3701, simple_loss=0.397, pruned_loss=0.1259, ctc_loss=0.2285, over 16498.00 frames. ], tot_loss[loss=0.2485, simple_loss=0.2995, pruned_loss=0.07309, ctc_loss=0.1284, over 3301813.61 frames. ], batch size: 416, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:15:57,053 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.89 vs. limit=15.0 2023-10-09 19:16:01,568 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2848496.0, ans=0.1 2023-10-09 19:16:10,585 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2848496.0, ans=0.125 2023-10-09 19:16:11,373 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.452e+02 3.570e+02 3.954e+02 4.505e+02 1.083e+03, threshold=7.908e+02, percent-clipped=2.0 2023-10-09 19:16:19,280 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2848542.6666666665, ans=0.125 2023-10-09 19:16:23,112 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2848542.6666666665, ans=0.125 2023-10-09 19:16:23,174 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2848542.6666666665, ans=0.2 2023-10-09 19:16:28,913 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.81 vs. limit=15.0 2023-10-09 19:16:50,505 INFO [train.py:1031] (2/4) Epoch 14, batch 25700, loss[loss=0.2419, simple_loss=0.285, pruned_loss=0.07534, ctc_loss=0.1203, over 16363.00 frames. ], tot_loss[loss=0.2548, simple_loss=0.3062, pruned_loss=0.07525, ctc_loss=0.1322, over 3306549.78 frames. ], batch size: 70, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:16:59,966 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2848682.6666666665, ans=0.1 2023-10-09 19:17:18,856 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2848776.0, ans=0.2 2023-10-09 19:17:21,053 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2848776.0, ans=0.125 2023-10-09 19:17:22,163 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2848776.0, ans=0.0 2023-10-09 19:17:33,821 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.22 vs. limit=15.0 2023-10-09 19:17:39,308 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.39 vs. limit=22.5 2023-10-09 19:17:45,412 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2848869.3333333335, ans=0.125 2023-10-09 19:17:46,515 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2848869.3333333335, ans=0.125 2023-10-09 19:17:51,124 INFO [train.py:1031] (2/4) Epoch 14, batch 25750, loss[loss=0.2355, simple_loss=0.2854, pruned_loss=0.06937, ctc_loss=0.1171, over 16732.00 frames. ], tot_loss[loss=0.2546, simple_loss=0.3062, pruned_loss=0.07514, ctc_loss=0.1318, over 3300141.33 frames. ], batch size: 111, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:18:06,822 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2848962.6666666665, ans=0.125 2023-10-09 19:18:09,532 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-10-09 19:18:14,869 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2848962.6666666665, ans=0.125 2023-10-09 19:18:17,302 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.324e+02 3.581e+02 3.886e+02 4.426e+02 7.686e+02, threshold=7.772e+02, percent-clipped=0.0 2023-10-09 19:18:17,914 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.07 vs. limit=22.5 2023-10-09 19:18:21,924 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2849009.3333333335, ans=0.125 2023-10-09 19:18:32,977 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2849056.0, ans=0.125 2023-10-09 19:18:43,278 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2849102.6666666665, ans=0.0 2023-10-09 19:18:56,349 INFO [train.py:1031] (2/4) Epoch 14, batch 25800, loss[loss=0.2151, simple_loss=0.2484, pruned_loss=0.06947, ctc_loss=0.1069, over 16926.00 frames. ], tot_loss[loss=0.246, simple_loss=0.3018, pruned_loss=0.07035, ctc_loss=0.1238, over 3302115.63 frames. ], batch size: 78, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:18:57,786 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2849149.3333333335, ans=0.125 2023-10-09 19:18:59,693 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2849149.3333333335, ans=0.125 2023-10-09 19:19:22,647 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2849242.6666666665, ans=0.0 2023-10-09 19:19:59,395 INFO [train.py:1031] (2/4) Epoch 14, batch 25850, loss[loss=0.2527, simple_loss=0.3385, pruned_loss=0.0622, ctc_loss=0.1061, over 16281.00 frames. ], tot_loss[loss=0.2416, simple_loss=0.2992, pruned_loss=0.06809, ctc_loss=0.1194, over 3296465.17 frames. ], batch size: 463, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:20:01,745 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2849382.6666666665, ans=0.0 2023-10-09 19:20:07,182 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.85 vs. limit=22.5 2023-10-09 19:20:15,431 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2849429.3333333335, ans=0.2 2023-10-09 19:20:23,041 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2849476.0, ans=0.125 2023-10-09 19:20:24,802 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.590e+02 3.413e+02 3.966e+02 4.957e+02 9.645e+02, threshold=7.933e+02, percent-clipped=3.0 2023-10-09 19:20:28,034 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.90 vs. limit=15.0 2023-10-09 19:20:30,027 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2849476.0, ans=0.0 2023-10-09 19:20:34,381 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2849476.0, ans=0.2 2023-10-09 19:20:35,279 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2849522.6666666665, ans=0.125 2023-10-09 19:20:44,738 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2849522.6666666665, ans=0.1 2023-10-09 19:20:57,323 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 19:20:58,321 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2849569.3333333335, ans=0.2 2023-10-09 19:21:00,847 INFO [train.py:1031] (2/4) Epoch 14, batch 25900, loss[loss=0.1923, simple_loss=0.2501, pruned_loss=0.05033, ctc_loss=0.08471, over 16789.00 frames. ], tot_loss[loss=0.2379, simple_loss=0.2956, pruned_loss=0.06688, ctc_loss=0.1161, over 3283759.68 frames. ], batch size: 164, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:21:02,318 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=2849616.0, ans=0.5 2023-10-09 19:21:26,231 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2849709.3333333335, ans=0.125 2023-10-09 19:21:43,518 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2849756.0, ans=0.125 2023-10-09 19:21:46,201 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2849756.0, ans=0.125 2023-10-09 19:21:48,924 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.70 vs. limit=22.5 2023-10-09 19:21:49,987 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2849802.6666666665, ans=0.125 2023-10-09 19:22:01,801 INFO [train.py:1031] (2/4) Epoch 14, batch 25950, loss[loss=0.1907, simple_loss=0.2657, pruned_loss=0.04228, ctc_loss=0.07773, over 17011.00 frames. ], tot_loss[loss=0.2302, simple_loss=0.2911, pruned_loss=0.06278, ctc_loss=0.1096, over 3296111.31 frames. ], batch size: 216, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:22:02,207 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2849849.3333333335, ans=0.1 2023-10-09 19:22:17,676 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.89 vs. limit=6.0 2023-10-09 19:22:28,773 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.139e+02 2.824e+02 3.535e+02 4.166e+02 1.027e+03, threshold=7.071e+02, percent-clipped=2.0 2023-10-09 19:23:00,519 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2850036.0, ans=0.125 2023-10-09 19:23:01,500 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2850082.6666666665, ans=0.125 2023-10-09 19:23:02,804 INFO [train.py:1031] (2/4) Epoch 14, batch 26000, loss[loss=0.2275, simple_loss=0.2713, pruned_loss=0.06765, ctc_loss=0.1207, over 16909.00 frames. ], tot_loss[loss=0.2296, simple_loss=0.2881, pruned_loss=0.06335, ctc_loss=0.1109, over 3302261.57 frames. ], batch size: 242, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:23:09,998 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2850082.6666666665, ans=0.05 2023-10-09 19:23:23,238 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.93 vs. limit=15.0 2023-10-09 19:23:26,773 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.83 vs. limit=15.0 2023-10-09 19:23:40,610 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2850222.6666666665, ans=0.125 2023-10-09 19:23:49,377 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2850222.6666666665, ans=0.125 2023-10-09 19:23:55,401 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2850269.3333333335, ans=0.5 2023-10-09 19:23:57,574 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2850269.3333333335, ans=0.2 2023-10-09 19:24:00,129 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2850269.3333333335, ans=0.0 2023-10-09 19:24:00,409 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=2850269.3333333335, ans=15.0 2023-10-09 19:24:04,536 INFO [train.py:1031] (2/4) Epoch 14, batch 26050, loss[loss=0.2276, simple_loss=0.2976, pruned_loss=0.05999, ctc_loss=0.09399, over 16812.00 frames. ], tot_loss[loss=0.2281, simple_loss=0.2876, pruned_loss=0.06245, ctc_loss=0.1094, over 3305523.21 frames. ], batch size: 121, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:24:12,827 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2850316.0, ans=0.125 2023-10-09 19:24:31,163 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.275e+02 3.011e+02 3.542e+02 4.270e+02 6.836e+02, threshold=7.085e+02, percent-clipped=0.0 2023-10-09 19:24:35,332 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2850409.3333333335, ans=0.125 2023-10-09 19:24:35,341 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2850409.3333333335, ans=0.1 2023-10-09 19:24:40,078 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 19:24:59,327 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2850502.6666666665, ans=0.125 2023-10-09 19:24:59,370 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2850502.6666666665, ans=0.125 2023-10-09 19:25:04,427 INFO [train.py:1031] (2/4) Epoch 14, batch 26100, loss[loss=0.2574, simple_loss=0.3261, pruned_loss=0.07127, ctc_loss=0.1154, over 16844.00 frames. ], tot_loss[loss=0.2284, simple_loss=0.2904, pruned_loss=0.06179, ctc_loss=0.107, over 3303878.15 frames. ], batch size: 90, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:25:07,816 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2850549.3333333335, ans=0.125 2023-10-09 19:25:15,299 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2850596.0, ans=0.125 2023-10-09 19:25:22,939 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2850596.0, ans=0.125 2023-10-09 19:25:26,264 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.03 vs. limit=12.0 2023-10-09 19:25:47,433 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2850689.3333333335, ans=0.125 2023-10-09 19:26:06,162 INFO [train.py:1031] (2/4) Epoch 14, batch 26150, loss[loss=0.2569, simple_loss=0.3309, pruned_loss=0.06919, ctc_loss=0.1115, over 16815.00 frames. ], tot_loss[loss=0.2341, simple_loss=0.2945, pruned_loss=0.06456, ctc_loss=0.1115, over 3308224.19 frames. ], batch size: 95, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:26:09,316 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2850782.6666666665, ans=0.125 2023-10-09 19:26:36,016 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.254e+02 3.218e+02 3.789e+02 4.435e+02 6.214e+02, threshold=7.579e+02, percent-clipped=0.0 2023-10-09 19:26:38,339 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=2850876.0, ans=0.1 2023-10-09 19:26:48,573 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2850922.6666666665, ans=0.0 2023-10-09 19:27:07,869 INFO [train.py:1031] (2/4) Epoch 14, batch 26200, loss[loss=0.1957, simple_loss=0.2764, pruned_loss=0.04316, ctc_loss=0.07191, over 15197.00 frames. ], tot_loss[loss=0.2342, simple_loss=0.2933, pruned_loss=0.06508, ctc_loss=0.1121, over 3305416.56 frames. ], batch size: 529, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:27:38,858 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2851109.3333333335, ans=0.125 2023-10-09 19:28:00,618 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=2851202.6666666665, ans=0.5 2023-10-09 19:28:09,472 INFO [train.py:1031] (2/4) Epoch 14, batch 26250, loss[loss=0.2078, simple_loss=0.2872, pruned_loss=0.04731, ctc_loss=0.08449, over 16880.00 frames. ], tot_loss[loss=0.2234, simple_loss=0.2817, pruned_loss=0.06145, ctc_loss=0.1053, over 3300077.68 frames. ], batch size: 310, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:28:43,499 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.145e+02 3.073e+02 4.030e+02 5.136e+02 8.779e+02, threshold=8.059e+02, percent-clipped=2.0 2023-10-09 19:28:48,249 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2851389.3333333335, ans=10.0 2023-10-09 19:28:50,320 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2851389.3333333335, ans=0.1 2023-10-09 19:29:05,549 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2851436.0, ans=0.125 2023-10-09 19:29:13,858 INFO [train.py:1031] (2/4) Epoch 14, batch 26300, loss[loss=0.2496, simple_loss=0.2994, pruned_loss=0.07476, ctc_loss=0.1255, over 16812.00 frames. ], tot_loss[loss=0.2237, simple_loss=0.2836, pruned_loss=0.06101, ctc_loss=0.1047, over 3296019.76 frames. ], batch size: 121, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:29:15,243 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2851482.6666666665, ans=0.125 2023-10-09 19:29:15,349 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2851482.6666666665, ans=0.125 2023-10-09 19:29:32,286 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2851529.3333333335, ans=0.2 2023-10-09 19:29:54,330 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2851622.6666666665, ans=0.125 2023-10-09 19:30:00,829 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2851622.6666666665, ans=0.125 2023-10-09 19:30:05,816 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2851669.3333333335, ans=0.125 2023-10-09 19:30:18,137 INFO [train.py:1031] (2/4) Epoch 14, batch 26350, loss[loss=0.2465, simple_loss=0.3153, pruned_loss=0.06506, ctc_loss=0.1191, over 16848.00 frames. ], tot_loss[loss=0.2325, simple_loss=0.2918, pruned_loss=0.06424, ctc_loss=0.1119, over 3296157.17 frames. ], batch size: 242, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:30:20,701 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2851716.0, ans=0.1 2023-10-09 19:30:25,489 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2851716.0, ans=0.1 2023-10-09 19:30:30,475 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2851762.6666666665, ans=0.1 2023-10-09 19:30:30,553 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2851762.6666666665, ans=0.1 2023-10-09 19:30:31,513 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2851762.6666666665, ans=0.1 2023-10-09 19:30:40,254 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2851762.6666666665, ans=0.1 2023-10-09 19:30:49,813 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.714e+02 3.520e+02 4.150e+02 4.845e+02 1.370e+03, threshold=8.299e+02, percent-clipped=2.0 2023-10-09 19:31:17,331 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2851902.6666666665, ans=0.0 2023-10-09 19:31:20,348 INFO [train.py:1031] (2/4) Epoch 14, batch 26400, loss[loss=0.2392, simple_loss=0.2993, pruned_loss=0.06427, ctc_loss=0.1267, over 16794.00 frames. ], tot_loss[loss=0.2334, simple_loss=0.2924, pruned_loss=0.06465, ctc_loss=0.1129, over 3304368.81 frames. ], batch size: 329, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:31:24,083 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 19:31:25,110 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 19:31:28,824 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2851949.3333333335, ans=0.125 2023-10-09 19:32:19,539 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2852136.0, ans=0.1 2023-10-09 19:32:24,384 INFO [train.py:1031] (2/4) Epoch 14, batch 26450, loss[loss=0.1778, simple_loss=0.2322, pruned_loss=0.04637, ctc_loss=0.07639, over 16718.00 frames. ], tot_loss[loss=0.229, simple_loss=0.2893, pruned_loss=0.06251, ctc_loss=0.1092, over 3304132.98 frames. ], batch size: 102, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:32:53,638 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2852276.0, ans=0.125 2023-10-09 19:32:58,049 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.345e+02 3.048e+02 3.586e+02 4.298e+02 7.757e+02, threshold=7.171e+02, percent-clipped=0.0 2023-10-09 19:33:18,588 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2852369.3333333335, ans=0.1 2023-10-09 19:33:25,303 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2852369.3333333335, ans=0.0 2023-10-09 19:33:28,750 INFO [train.py:1031] (2/4) Epoch 14, batch 26500, loss[loss=0.2606, simple_loss=0.3144, pruned_loss=0.07492, ctc_loss=0.1423, over 16204.00 frames. ], tot_loss[loss=0.2297, simple_loss=0.2897, pruned_loss=0.06299, ctc_loss=0.1095, over 3296684.31 frames. ], batch size: 463, lr: 2.54e-03, grad_scale: 8.0 2023-10-09 19:33:38,576 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.69 vs. limit=15.0 2023-10-09 19:33:46,148 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2852462.6666666665, ans=0.125 2023-10-09 19:33:46,696 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.78 vs. limit=10.0 2023-10-09 19:34:19,420 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2852602.6666666665, ans=0.125 2023-10-09 19:34:30,312 INFO [train.py:1031] (2/4) Epoch 14, batch 26550, loss[loss=0.2266, simple_loss=0.3036, pruned_loss=0.0547, ctc_loss=0.1003, over 16913.00 frames. ], tot_loss[loss=0.2351, simple_loss=0.2934, pruned_loss=0.06561, ctc_loss=0.1137, over 3295359.68 frames. ], batch size: 258, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:34:33,430 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2852649.3333333335, ans=0.1 2023-10-09 19:35:06,513 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.325e+02 3.537e+02 4.198e+02 5.222e+02 9.143e+02, threshold=8.395e+02, percent-clipped=3.0 2023-10-09 19:35:19,171 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.42 vs. limit=15.0 2023-10-09 19:35:32,291 INFO [train.py:1031] (2/4) Epoch 14, batch 26600, loss[loss=0.2495, simple_loss=0.3168, pruned_loss=0.06695, ctc_loss=0.1207, over 16878.00 frames. ], tot_loss[loss=0.2366, simple_loss=0.2972, pruned_loss=0.06523, ctc_loss=0.1139, over 3288144.60 frames. ], batch size: 228, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:35:40,045 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 19:36:03,330 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2852976.0, ans=0.2 2023-10-09 19:36:08,138 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2853022.6666666665, ans=0.125 2023-10-09 19:36:10,879 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2853022.6666666665, ans=0.125 2023-10-09 19:36:23,327 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.87 vs. limit=15.0 2023-10-09 19:36:34,511 INFO [train.py:1031] (2/4) Epoch 14, batch 26650, loss[loss=0.1991, simple_loss=0.3054, pruned_loss=0.03215, ctc_loss=0.07108, over 15078.00 frames. ], tot_loss[loss=0.2329, simple_loss=0.2977, pruned_loss=0.06201, ctc_loss=0.11, over 3277374.15 frames. ], batch size: 527, lr: 2.54e-03, grad_scale: 1.0 2023-10-09 19:36:41,946 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2853116.0, ans=0.0 2023-10-09 19:36:43,597 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2853116.0, ans=0.125 2023-10-09 19:36:47,684 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.00 vs. limit=12.0 2023-10-09 19:36:55,400 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2853162.6666666665, ans=0.125 2023-10-09 19:37:07,754 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.67 vs. limit=15.0 2023-10-09 19:37:09,568 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2853256.0, ans=0.125 2023-10-09 19:37:09,673 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2853256.0, ans=0.125 2023-10-09 19:37:10,950 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.144e+02 3.027e+02 3.490e+02 4.414e+02 7.979e+02, threshold=6.980e+02, percent-clipped=0.0 2023-10-09 19:37:17,275 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2853256.0, ans=0.125 2023-10-09 19:37:26,925 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=2853302.6666666665, ans=0.2 2023-10-09 19:37:35,081 INFO [train.py:1031] (2/4) Epoch 14, batch 26700, loss[loss=0.2023, simple_loss=0.2511, pruned_loss=0.05702, ctc_loss=0.09837, over 16660.00 frames. ], tot_loss[loss=0.2265, simple_loss=0.2905, pruned_loss=0.0599, ctc_loss=0.1067, over 3282282.31 frames. ], batch size: 151, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:38:05,317 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2853442.6666666665, ans=0.125 2023-10-09 19:38:06,285 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2853442.6666666665, ans=0.05 2023-10-09 19:38:36,853 INFO [train.py:1031] (2/4) Epoch 14, batch 26750, loss[loss=0.2005, simple_loss=0.2543, pruned_loss=0.054, ctc_loss=0.09678, over 16763.00 frames. ], tot_loss[loss=0.2227, simple_loss=0.2838, pruned_loss=0.05962, ctc_loss=0.1059, over 3289893.24 frames. ], batch size: 188, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:38:58,812 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.82 vs. limit=15.0 2023-10-09 19:39:14,602 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.317e+02 3.221e+02 3.735e+02 4.264e+02 6.455e+02, threshold=7.471e+02, percent-clipped=0.0 2023-10-09 19:39:31,487 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2853769.3333333335, ans=0.0 2023-10-09 19:39:38,949 INFO [train.py:1031] (2/4) Epoch 14, batch 26800, loss[loss=0.2112, simple_loss=0.2536, pruned_loss=0.06168, ctc_loss=0.1138, over 15355.00 frames. ], tot_loss[loss=0.221, simple_loss=0.2803, pruned_loss=0.05971, ctc_loss=0.1058, over 3288148.26 frames. ], batch size: 527, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:40:01,488 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2853862.6666666665, ans=0.125 2023-10-09 19:40:11,300 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.63 vs. limit=22.5 2023-10-09 19:40:23,849 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.62 vs. limit=22.5 2023-10-09 19:40:27,992 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2854002.6666666665, ans=0.0 2023-10-09 19:40:41,949 INFO [train.py:1031] (2/4) Epoch 14, batch 26850, loss[loss=0.186, simple_loss=0.2324, pruned_loss=0.05193, ctc_loss=0.08967, over 10572.00 frames. ], tot_loss[loss=0.2277, simple_loss=0.2843, pruned_loss=0.06326, ctc_loss=0.1116, over 3292129.83 frames. ], batch size: 36, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:40:44,199 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=8.70 vs. limit=12.0 2023-10-09 19:40:58,283 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2854096.0, ans=0.125 2023-10-09 19:41:21,410 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.478e+02 3.526e+02 4.022e+02 4.797e+02 9.323e+02, threshold=8.043e+02, percent-clipped=3.0 2023-10-09 19:41:45,193 INFO [train.py:1031] (2/4) Epoch 14, batch 26900, loss[loss=0.2131, simple_loss=0.2787, pruned_loss=0.05463, ctc_loss=0.09592, over 16873.00 frames. ], tot_loss[loss=0.2303, simple_loss=0.2898, pruned_loss=0.06304, ctc_loss=0.1117, over 3297184.50 frames. ], batch size: 189, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:41:45,793 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.83 vs. limit=15.0 2023-10-09 19:41:45,837 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.15 vs. limit=15.0 2023-10-09 19:41:47,569 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2854282.6666666665, ans=0.125 2023-10-09 19:42:02,827 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.65 vs. limit=10.0 2023-10-09 19:42:08,831 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 19:42:10,584 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2854376.0, ans=0.125 2023-10-09 19:42:25,210 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2854422.6666666665, ans=0.5 2023-10-09 19:42:47,763 INFO [train.py:1031] (2/4) Epoch 14, batch 26950, loss[loss=0.2057, simple_loss=0.2402, pruned_loss=0.06464, ctc_loss=0.1051, over 11348.00 frames. ], tot_loss[loss=0.231, simple_loss=0.291, pruned_loss=0.06303, ctc_loss=0.1122, over 3299567.42 frames. ], batch size: 35, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:42:52,837 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2854516.0, ans=0.2 2023-10-09 19:42:52,905 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2854516.0, ans=0.125 2023-10-09 19:42:55,802 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2023-10-09 19:42:56,598 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2854516.0, ans=0.0 2023-10-09 19:42:59,749 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2854562.6666666665, ans=0.125 2023-10-09 19:43:26,398 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.573e+02 3.110e+02 3.559e+02 4.212e+02 9.939e+02, threshold=7.118e+02, percent-clipped=2.0 2023-10-09 19:43:39,048 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2854702.6666666665, ans=0.2 2023-10-09 19:43:48,324 INFO [train.py:1031] (2/4) Epoch 14, batch 27000, loss[loss=0.2075, simple_loss=0.2557, pruned_loss=0.05891, ctc_loss=0.1037, over 16781.00 frames. ], tot_loss[loss=0.2277, simple_loss=0.2841, pruned_loss=0.06327, ctc_loss=0.1121, over 3307489.17 frames. ], batch size: 228, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:43:48,325 INFO [train.py:1054] (2/4) Computing validation loss 2023-10-09 19:44:05,973 INFO [zipformer.py:1853] (2/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.0186, 2.6124, 2.9788, 2.8917], device='cuda:2') 2023-10-09 19:44:06,709 INFO [train.py:1063] (2/4) Epoch 14, validation: loss=0.2336, simple_loss=0.3018, pruned_loss=0.06376, ctc_loss=0.09459, over 1796401.00 frames. 2023-10-09 19:44:06,709 INFO [train.py:1064] (2/4) Maximum memory allocated so far is 14591MB 2023-10-09 19:44:11,874 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2854749.3333333335, ans=0.2 2023-10-09 19:44:12,793 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2854749.3333333335, ans=0.125 2023-10-09 19:44:23,694 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.40 vs. limit=15.0 2023-10-09 19:44:28,315 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2854796.0, ans=0.2 2023-10-09 19:45:01,939 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2854936.0, ans=0.125 2023-10-09 19:45:06,467 INFO [train.py:1031] (2/4) Epoch 14, batch 27050, loss[loss=0.2122, simple_loss=0.2674, pruned_loss=0.05919, ctc_loss=0.09662, over 16952.00 frames. ], tot_loss[loss=0.222, simple_loss=0.2781, pruned_loss=0.06142, ctc_loss=0.1078, over 3316995.21 frames. ], batch size: 82, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:45:12,846 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.99 vs. limit=22.5 2023-10-09 19:45:35,657 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=27.53 vs. limit=22.5 2023-10-09 19:45:40,016 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2855122.6666666665, ans=0.0 2023-10-09 19:45:44,970 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+02 2.824e+02 3.206e+02 4.209e+02 1.336e+03, threshold=6.413e+02, percent-clipped=5.0 2023-10-09 19:45:49,815 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.29 vs. limit=15.0 2023-10-09 19:45:53,717 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.14 vs. limit=15.0 2023-10-09 19:46:05,132 INFO [train.py:1031] (2/4) Epoch 14, batch 27100, loss[loss=0.2004, simple_loss=0.2512, pruned_loss=0.05507, ctc_loss=0.09856, over 16457.00 frames. ], tot_loss[loss=0.2167, simple_loss=0.2728, pruned_loss=0.05958, ctc_loss=0.1038, over 3318934.07 frames. ], batch size: 466, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:46:16,489 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2855262.6666666665, ans=0.125 2023-10-09 19:46:20,490 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.15 vs. limit=15.0 2023-10-09 19:46:53,522 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2855402.6666666665, ans=0.1 2023-10-09 19:46:56,240 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=2855402.6666666665, ans=0.05 2023-10-09 19:47:04,172 INFO [train.py:1031] (2/4) Epoch 14, batch 27150, loss[loss=0.2185, simple_loss=0.2413, pruned_loss=0.0718, ctc_loss=0.1301, over 15619.00 frames. ], tot_loss[loss=0.217, simple_loss=0.272, pruned_loss=0.06012, ctc_loss=0.1044, over 3315551.23 frames. ], batch size: 530, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:47:05,528 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2855449.3333333335, ans=0.0 2023-10-09 19:47:08,259 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2855449.3333333335, ans=0.2 2023-10-09 19:47:08,298 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2855449.3333333335, ans=0.0 2023-10-09 19:47:20,000 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.60 vs. limit=15.0 2023-10-09 19:47:20,658 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2855496.0, ans=0.1 2023-10-09 19:47:21,985 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=8.09 vs. limit=12.0 2023-10-09 19:47:28,907 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.16 vs. limit=15.0 2023-10-09 19:47:33,834 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.68 vs. limit=15.0 2023-10-09 19:47:46,371 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.370e+02 3.038e+02 3.523e+02 4.275e+02 1.319e+03, threshold=7.047e+02, percent-clipped=7.0 2023-10-09 19:48:03,950 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2855636.0, ans=0.0 2023-10-09 19:48:05,720 INFO [train.py:1031] (2/4) Epoch 14, batch 27200, loss[loss=0.2341, simple_loss=0.3214, pruned_loss=0.05406, ctc_loss=0.0969, over 16798.00 frames. ], tot_loss[loss=0.2225, simple_loss=0.282, pruned_loss=0.06034, ctc_loss=0.1059, over 3309389.32 frames. ], batch size: 188, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:48:07,117 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2855682.6666666665, ans=0.125 2023-10-09 19:48:17,231 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2855729.3333333335, ans=0.125 2023-10-09 19:48:19,646 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2855729.3333333335, ans=0.04949747468305833 2023-10-09 19:48:31,022 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2855776.0, ans=0.125 2023-10-09 19:49:06,308 INFO [train.py:1031] (2/4) Epoch 14, batch 27250, loss[loss=0.2578, simple_loss=0.295, pruned_loss=0.08101, ctc_loss=0.1467, over 16734.00 frames. ], tot_loss[loss=0.2236, simple_loss=0.2837, pruned_loss=0.0605, ctc_loss=0.1064, over 3293201.78 frames. ], batch size: 328, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:49:17,032 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2855916.0, ans=0.0 2023-10-09 19:49:50,132 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=4.90 vs. limit=15.0 2023-10-09 19:49:51,926 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.115e+02 3.213e+02 3.949e+02 4.744e+02 1.249e+03, threshold=7.899e+02, percent-clipped=6.0 2023-10-09 19:49:52,337 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2856056.0, ans=0.09899494936611666 2023-10-09 19:50:01,874 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=12.56 vs. limit=22.5 2023-10-09 19:50:10,442 INFO [train.py:1031] (2/4) Epoch 14, batch 27300, loss[loss=0.1807, simple_loss=0.2515, pruned_loss=0.04081, ctc_loss=0.07051, over 16822.00 frames. ], tot_loss[loss=0.2204, simple_loss=0.2787, pruned_loss=0.05994, ctc_loss=0.1056, over 3291411.85 frames. ], batch size: 189, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:50:21,168 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 19:50:55,043 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.65 vs. limit=15.0 2023-10-09 19:50:58,905 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2856289.3333333335, ans=0.09899494936611666 2023-10-09 19:50:59,911 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2856336.0, ans=0.125 2023-10-09 19:51:13,358 INFO [train.py:1031] (2/4) Epoch 14, batch 27350, loss[loss=0.1909, simple_loss=0.2413, pruned_loss=0.05296, ctc_loss=0.08648, over 16514.00 frames. ], tot_loss[loss=0.2152, simple_loss=0.2759, pruned_loss=0.05703, ctc_loss=0.101, over 3289289.76 frames. ], batch size: 110, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:51:29,152 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2856429.3333333335, ans=0.2 2023-10-09 19:51:36,044 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2856429.3333333335, ans=0.025 2023-10-09 19:51:41,714 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2856476.0, ans=0.0 2023-10-09 19:51:57,548 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.12 vs. limit=15.0 2023-10-09 19:51:58,295 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2856522.6666666665, ans=0.125 2023-10-09 19:51:58,975 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.036e+02 2.714e+02 3.156e+02 4.138e+02 1.229e+03, threshold=6.312e+02, percent-clipped=2.0 2023-10-09 19:52:09,522 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=2856569.3333333335, ans=0.125 2023-10-09 19:52:15,404 INFO [train.py:1031] (2/4) Epoch 14, batch 27400, loss[loss=0.1873, simple_loss=0.2553, pruned_loss=0.04327, ctc_loss=0.08216, over 16936.00 frames. ], tot_loss[loss=0.2097, simple_loss=0.2724, pruned_loss=0.05421, ctc_loss=0.0963, over 3290520.66 frames. ], batch size: 259, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:52:19,802 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.71 vs. limit=15.0 2023-10-09 19:52:21,660 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2856616.0, ans=0.0 2023-10-09 19:52:55,655 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2856756.0, ans=0.05 2023-10-09 19:52:57,621 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2856756.0, ans=0.125 2023-10-09 19:53:07,150 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2856802.6666666665, ans=0.0 2023-10-09 19:53:15,199 INFO [train.py:1031] (2/4) Epoch 14, batch 27450, loss[loss=0.1955, simple_loss=0.242, pruned_loss=0.05546, ctc_loss=0.09518, over 11732.00 frames. ], tot_loss[loss=0.2086, simple_loss=0.2683, pruned_loss=0.05495, ctc_loss=0.09759, over 3288452.81 frames. ], batch size: 35, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:53:29,501 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2856896.0, ans=0.0 2023-10-09 19:53:38,942 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2856942.6666666665, ans=0.0 2023-10-09 19:53:58,303 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2856989.3333333335, ans=0.125 2023-10-09 19:54:00,073 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.110e+02 2.825e+02 3.198e+02 4.029e+02 6.832e+02, threshold=6.397e+02, percent-clipped=4.0 2023-10-09 19:54:16,222 INFO [train.py:1031] (2/4) Epoch 14, batch 27500, loss[loss=0.1826, simple_loss=0.2452, pruned_loss=0.04441, ctc_loss=0.07787, over 16687.00 frames. ], tot_loss[loss=0.2079, simple_loss=0.2686, pruned_loss=0.05433, ctc_loss=0.09648, over 3286750.36 frames. ], batch size: 130, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 19:54:40,220 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2857176.0, ans=0.125 2023-10-09 19:54:44,978 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2857176.0, ans=0.2 2023-10-09 19:54:59,670 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2857222.6666666665, ans=0.1 2023-10-09 19:55:17,196 INFO [train.py:1031] (2/4) Epoch 14, batch 27550, loss[loss=0.2589, simple_loss=0.2888, pruned_loss=0.08778, ctc_loss=0.1337, over 10600.00 frames. ], tot_loss[loss=0.2077, simple_loss=0.2669, pruned_loss=0.05481, ctc_loss=0.09714, over 3274499.07 frames. ], batch size: 36, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:55:17,558 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2857316.0, ans=0.125 2023-10-09 19:55:18,767 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=8.63 vs. limit=12.0 2023-10-09 19:55:24,103 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 19:55:45,836 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2857409.3333333335, ans=0.1 2023-10-09 19:56:03,254 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2857456.0, ans=0.125 2023-10-09 19:56:06,317 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.03 vs. limit=15.0 2023-10-09 19:56:06,683 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.227e+02 3.165e+02 3.747e+02 4.293e+02 1.170e+03, threshold=7.493e+02, percent-clipped=3.0 2023-10-09 19:56:18,491 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2857502.6666666665, ans=0.125 2023-10-09 19:56:20,205 INFO [train.py:1031] (2/4) Epoch 14, batch 27600, loss[loss=0.1905, simple_loss=0.2442, pruned_loss=0.04995, ctc_loss=0.09245, over 16542.00 frames. ], tot_loss[loss=0.2079, simple_loss=0.2655, pruned_loss=0.05555, ctc_loss=0.09819, over 3275510.34 frames. ], batch size: 110, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:56:47,881 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2857642.6666666665, ans=0.125 2023-10-09 19:56:52,828 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2857642.6666666665, ans=0.125 2023-10-09 19:57:10,166 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2857736.0, ans=0.125 2023-10-09 19:57:15,054 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2857736.0, ans=0.05 2023-10-09 19:57:17,284 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2857736.0, ans=0.0 2023-10-09 19:57:17,443 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.88 vs. limit=6.0 2023-10-09 19:57:20,821 INFO [scaling.py:979] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.78 vs. limit=8.0 2023-10-09 19:57:22,123 INFO [train.py:1031] (2/4) Epoch 14, batch 27650, loss[loss=0.2463, simple_loss=0.2793, pruned_loss=0.08031, ctc_loss=0.1317, over 16266.00 frames. ], tot_loss[loss=0.2103, simple_loss=0.2684, pruned_loss=0.0562, ctc_loss=0.09958, over 3277740.45 frames. ], batch size: 71, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:57:25,928 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2857782.6666666665, ans=0.125 2023-10-09 19:57:27,339 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.43 vs. limit=15.0 2023-10-09 19:57:29,632 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2857782.6666666665, ans=0.125 2023-10-09 19:58:08,369 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2857922.6666666665, ans=0.0 2023-10-09 19:58:10,221 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.553e+02 3.178e+02 3.688e+02 4.513e+02 1.131e+03, threshold=7.375e+02, percent-clipped=1.0 2023-10-09 19:58:24,464 INFO [train.py:1031] (2/4) Epoch 14, batch 27700, loss[loss=0.2105, simple_loss=0.2522, pruned_loss=0.06376, ctc_loss=0.1031, over 16665.00 frames. ], tot_loss[loss=0.2106, simple_loss=0.2663, pruned_loss=0.05728, ctc_loss=0.1009, over 3286632.59 frames. ], batch size: 140, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:58:48,952 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2858109.3333333335, ans=0.125 2023-10-09 19:59:11,642 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2858202.6666666665, ans=0.125 2023-10-09 19:59:22,198 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2858202.6666666665, ans=0.1 2023-10-09 19:59:24,061 INFO [train.py:1031] (2/4) Epoch 14, batch 27750, loss[loss=0.1982, simple_loss=0.256, pruned_loss=0.05134, ctc_loss=0.09415, over 16912.00 frames. ], tot_loss[loss=0.2127, simple_loss=0.2654, pruned_loss=0.05915, ctc_loss=0.104, over 3292658.32 frames. ], batch size: 215, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 19:59:37,316 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2858296.0, ans=0.125 2023-10-09 19:59:39,633 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2858296.0, ans=0.125 2023-10-09 19:59:39,679 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2858296.0, ans=0.07 2023-10-09 19:59:41,942 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=2858296.0, ans=10.0 2023-10-09 19:59:48,630 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2858342.6666666665, ans=0.0 2023-10-09 20:00:04,992 WARNING [train.py:1204] (2/4) Exclude cut with ID R0014_M0086-0174-157 from training. Number of frames (before subsampling): 147. Number of frames (after subsampling): 35. Text: 你买多少东西一会儿他就送你这么多东西啊啊三大桶那三大桶得用多少时间就啊. Tokens: ['▁你', '买', '多', '少', '东', '西', '一', '会', '儿', '他', '就', '送', '你', '这', '么', '多', '东', '西', '啊', '啊', '三', '大', '<0xE6>', '<0xA1>', '<0xB6>', '那', '三', '大', '<0xE6>', '<0xA1>', '<0xB6>', '得', '用', '多', '少', '时', '间', '就', '啊']. Number of tokens: 39 2023-10-09 20:00:14,001 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.582e+02 3.399e+02 3.890e+02 4.499e+02 8.877e+02, threshold=7.779e+02, percent-clipped=2.0 2023-10-09 20:00:21,802 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2858436.0, ans=0.125 2023-10-09 20:00:24,188 INFO [train.py:1031] (2/4) Epoch 14, batch 27800, loss[loss=0.2302, simple_loss=0.2842, pruned_loss=0.06513, ctc_loss=0.115, over 16965.00 frames. ], tot_loss[loss=0.2152, simple_loss=0.2668, pruned_loss=0.06055, ctc_loss=0.1061, over 3299333.71 frames. ], batch size: 243, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:01:18,851 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2858669.3333333335, ans=0.0 2023-10-09 20:01:22,674 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2858669.3333333335, ans=0.0 2023-10-09 20:01:27,835 INFO [train.py:1031] (2/4) Epoch 14, batch 27850, loss[loss=0.2224, simple_loss=0.2781, pruned_loss=0.05987, ctc_loss=0.1172, over 16698.00 frames. ], tot_loss[loss=0.2231, simple_loss=0.2749, pruned_loss=0.06332, ctc_loss=0.1117, over 3304993.64 frames. ], batch size: 130, lr: 2.54e-03, grad_scale: 1.0 2023-10-09 20:01:55,645 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2858809.3333333335, ans=0.035 2023-10-09 20:02:02,327 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.75 vs. limit=15.0 2023-10-09 20:02:10,093 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2858856.0, ans=0.125 2023-10-09 20:02:13,597 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2858856.0, ans=0.0 2023-10-09 20:02:13,668 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2858856.0, ans=0.0 2023-10-09 20:02:18,206 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.539e+02 3.601e+02 4.394e+02 5.369e+02 1.444e+03, threshold=8.787e+02, percent-clipped=3.0 2023-10-09 20:02:27,371 INFO [train.py:1031] (2/4) Epoch 14, batch 27900, loss[loss=0.1749, simple_loss=0.2381, pruned_loss=0.04061, ctc_loss=0.07593, over 16646.00 frames. ], tot_loss[loss=0.2238, simple_loss=0.2786, pruned_loss=0.06235, ctc_loss=0.1109, over 3299019.58 frames. ], batch size: 111, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:02:27,698 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2858949.3333333335, ans=0.125 2023-10-09 20:02:51,062 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2858996.0, ans=0.125 2023-10-09 20:02:59,276 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 20:03:14,057 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2859089.3333333335, ans=0.2 2023-10-09 20:03:22,771 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2859136.0, ans=0.125 2023-10-09 20:03:29,819 INFO [train.py:1031] (2/4) Epoch 14, batch 27950, loss[loss=0.1675, simple_loss=0.2582, pruned_loss=0.02759, ctc_loss=0.05405, over 16913.00 frames. ], tot_loss[loss=0.216, simple_loss=0.2763, pruned_loss=0.05727, ctc_loss=0.1032, over 3296714.23 frames. ], batch size: 258, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:03:43,694 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2859229.3333333335, ans=0.0 2023-10-09 20:04:21,887 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.034e+02 2.805e+02 3.200e+02 4.012e+02 8.186e+02, threshold=6.399e+02, percent-clipped=0.0 2023-10-09 20:04:27,801 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2859369.3333333335, ans=0.1 2023-10-09 20:04:29,092 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.68 vs. limit=22.5 2023-10-09 20:04:31,540 INFO [train.py:1031] (2/4) Epoch 14, batch 28000, loss[loss=0.1997, simple_loss=0.2443, pruned_loss=0.05785, ctc_loss=0.09847, over 16683.00 frames. ], tot_loss[loss=0.2118, simple_loss=0.271, pruned_loss=0.05621, ctc_loss=0.1006, over 3295070.30 frames. ], batch size: 130, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:04:41,181 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.68 vs. limit=22.5 2023-10-09 20:04:45,898 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.36 vs. limit=15.0 2023-10-09 20:04:57,671 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.11 vs. limit=15.0 2023-10-09 20:05:13,114 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2859556.0, ans=0.2 2023-10-09 20:05:21,185 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2859602.6666666665, ans=0.125 2023-10-09 20:05:22,864 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2859602.6666666665, ans=0.0 2023-10-09 20:05:25,623 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2859602.6666666665, ans=0.05 2023-10-09 20:05:33,917 INFO [train.py:1031] (2/4) Epoch 14, batch 28050, loss[loss=0.2083, simple_loss=0.2601, pruned_loss=0.05755, ctc_loss=0.1035, over 16970.00 frames. ], tot_loss[loss=0.2128, simple_loss=0.2683, pruned_loss=0.05801, ctc_loss=0.1032, over 3296753.57 frames. ], batch size: 259, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:05:43,844 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-10-09 20:05:50,299 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2859696.0, ans=0.2 2023-10-09 20:05:53,983 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2859696.0, ans=0.125 2023-10-09 20:05:55,031 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2859696.0, ans=0.0 2023-10-09 20:06:01,224 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2859742.6666666665, ans=0.125 2023-10-09 20:06:08,302 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.17 vs. limit=15.0 2023-10-09 20:06:25,713 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.387e+02 3.244e+02 3.661e+02 4.395e+02 6.655e+02, threshold=7.321e+02, percent-clipped=2.0 2023-10-09 20:06:33,423 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2859882.6666666665, ans=0.125 2023-10-09 20:06:34,557 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.83 vs. limit=15.0 2023-10-09 20:06:34,761 INFO [train.py:1031] (2/4) Epoch 14, batch 28100, loss[loss=0.2097, simple_loss=0.261, pruned_loss=0.05915, ctc_loss=0.1002, over 16914.00 frames. ], tot_loss[loss=0.2151, simple_loss=0.2691, pruned_loss=0.05956, ctc_loss=0.1051, over 3297222.99 frames. ], batch size: 215, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:06:43,772 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2859882.6666666665, ans=0.125 2023-10-09 20:06:59,006 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2859976.0, ans=0.0 2023-10-09 20:07:19,297 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2860022.6666666665, ans=0.07 2023-10-09 20:07:34,845 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.49 vs. limit=15.0 2023-10-09 20:07:39,093 INFO [train.py:1031] (2/4) Epoch 14, batch 28150, loss[loss=0.2433, simple_loss=0.3439, pruned_loss=0.05067, ctc_loss=0.1035, over 16860.00 frames. ], tot_loss[loss=0.222, simple_loss=0.2791, pruned_loss=0.06078, ctc_loss=0.1085, over 3291287.26 frames. ], batch size: 328, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:07:40,644 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2860116.0, ans=0.07 2023-10-09 20:08:09,998 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2860209.3333333335, ans=0.0 2023-10-09 20:08:14,869 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2860209.3333333335, ans=0.125 2023-10-09 20:08:30,948 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2860302.6666666665, ans=0.0 2023-10-09 20:08:34,456 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.502e+02 3.258e+02 3.630e+02 4.315e+02 7.484e+02, threshold=7.260e+02, percent-clipped=1.0 2023-10-09 20:08:41,525 INFO [train.py:1031] (2/4) Epoch 14, batch 28200, loss[loss=0.2377, simple_loss=0.2924, pruned_loss=0.06693, ctc_loss=0.1227, over 16839.00 frames. ], tot_loss[loss=0.2304, simple_loss=0.2881, pruned_loss=0.06363, ctc_loss=0.1136, over 3295592.68 frames. ], batch size: 176, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:09:02,881 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.50 vs. limit=22.5 2023-10-09 20:09:36,488 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2860536.0, ans=0.2 2023-10-09 20:09:43,224 INFO [train.py:1031] (2/4) Epoch 14, batch 28250, loss[loss=0.2729, simple_loss=0.2916, pruned_loss=0.09285, ctc_loss=0.1715, over 16635.00 frames. ], tot_loss[loss=0.2352, simple_loss=0.2905, pruned_loss=0.06646, ctc_loss=0.1175, over 3307372.64 frames. ], batch size: 416, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:10:32,487 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2860769.3333333335, ans=0.0 2023-10-09 20:10:35,250 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2860769.3333333335, ans=0.0 2023-10-09 20:10:41,218 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.489e+02 3.506e+02 4.003e+02 4.873e+02 1.007e+03, threshold=8.006e+02, percent-clipped=4.0 2023-10-09 20:10:43,169 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2860769.3333333335, ans=0.0 2023-10-09 20:10:46,099 INFO [train.py:1031] (2/4) Epoch 14, batch 28300, loss[loss=0.1991, simple_loss=0.2571, pruned_loss=0.05187, ctc_loss=0.09351, over 16754.00 frames. ], tot_loss[loss=0.2354, simple_loss=0.2884, pruned_loss=0.06744, ctc_loss=0.1188, over 3302216.74 frames. ], batch size: 188, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:11:07,598 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2860862.6666666665, ans=10.0 2023-10-09 20:11:15,966 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2860909.3333333335, ans=0.125 2023-10-09 20:11:27,857 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=2860956.0, ans=0.125 2023-10-09 20:11:32,255 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2860956.0, ans=0.125 2023-10-09 20:11:48,188 INFO [train.py:1031] (2/4) Epoch 14, batch 28350, loss[loss=0.2011, simple_loss=0.2518, pruned_loss=0.05533, ctc_loss=0.09908, over 16942.00 frames. ], tot_loss[loss=0.2316, simple_loss=0.2828, pruned_loss=0.06679, ctc_loss=0.1169, over 3309759.14 frames. ], batch size: 86, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:11:49,594 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 20:12:08,075 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2861096.0, ans=0.125 2023-10-09 20:12:12,056 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 20:12:12,083 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2861142.6666666665, ans=0.95 2023-10-09 20:12:35,498 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2861189.3333333335, ans=0.1 2023-10-09 20:12:46,030 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.563e+02 3.327e+02 3.829e+02 4.439e+02 7.732e+02, threshold=7.659e+02, percent-clipped=0.0 2023-10-09 20:12:50,327 INFO [train.py:1031] (2/4) Epoch 14, batch 28400, loss[loss=0.1808, simple_loss=0.2462, pruned_loss=0.04283, ctc_loss=0.07444, over 16689.00 frames. ], tot_loss[loss=0.2334, simple_loss=0.2856, pruned_loss=0.06702, ctc_loss=0.1178, over 3301975.49 frames. ], batch size: 102, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:12:55,688 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2861282.6666666665, ans=0.0 2023-10-09 20:13:08,404 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2861329.3333333335, ans=0.125 2023-10-09 20:13:10,223 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2861329.3333333335, ans=0.0 2023-10-09 20:13:15,842 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 20:13:24,963 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2861376.0, ans=0.2 2023-10-09 20:13:27,486 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.56 vs. limit=22.5 2023-10-09 20:13:48,384 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=2861469.3333333335, ans=0.0 2023-10-09 20:13:56,934 INFO [train.py:1031] (2/4) Epoch 14, batch 28450, loss[loss=0.2511, simple_loss=0.3225, pruned_loss=0.06607, ctc_loss=0.1187, over 16813.00 frames. ], tot_loss[loss=0.2383, simple_loss=0.2949, pruned_loss=0.06716, ctc_loss=0.1187, over 3301093.42 frames. ], batch size: 176, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:14:11,127 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2861562.6666666665, ans=0.2 2023-10-09 20:14:30,297 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2861609.3333333335, ans=0.0 2023-10-09 20:14:36,168 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2861656.0, ans=0.125 2023-10-09 20:14:39,766 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.25 vs. limit=15.0 2023-10-09 20:14:45,341 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=2861656.0, ans=0.07 2023-10-09 20:14:55,769 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2861702.6666666665, ans=0.125 2023-10-09 20:14:58,087 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.469e+02 3.582e+02 4.557e+02 5.514e+02 1.079e+03, threshold=9.115e+02, percent-clipped=9.0 2023-10-09 20:15:01,355 INFO [train.py:1031] (2/4) Epoch 14, batch 28500, loss[loss=0.2066, simple_loss=0.2851, pruned_loss=0.04804, ctc_loss=0.08001, over 16717.00 frames. ], tot_loss[loss=0.2429, simple_loss=0.3042, pruned_loss=0.06699, ctc_loss=0.119, over 3302583.99 frames. ], batch size: 130, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:15:17,219 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.48 vs. limit=15.0 2023-10-09 20:15:22,977 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2861796.0, ans=0.1 2023-10-09 20:15:33,281 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2861842.6666666665, ans=0.125 2023-10-09 20:15:36,379 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2861889.3333333335, ans=0.125 2023-10-09 20:16:03,042 INFO [train.py:1031] (2/4) Epoch 14, batch 28550, loss[loss=0.1704, simple_loss=0.2416, pruned_loss=0.03682, ctc_loss=0.06367, over 16930.00 frames. ], tot_loss[loss=0.232, simple_loss=0.2981, pruned_loss=0.06108, ctc_loss=0.1091, over 3305540.71 frames. ], batch size: 82, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:16:09,469 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.08 vs. limit=22.5 2023-10-09 20:16:39,272 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2862122.6666666665, ans=0.0 2023-10-09 20:16:39,367 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2862122.6666666665, ans=0.07 2023-10-09 20:16:49,382 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2862122.6666666665, ans=0.0 2023-10-09 20:17:00,496 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.984e+02 2.790e+02 3.333e+02 3.902e+02 5.980e+02, threshold=6.666e+02, percent-clipped=0.0 2023-10-09 20:17:02,754 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.54 vs. limit=22.5 2023-10-09 20:17:03,187 INFO [train.py:1031] (2/4) Epoch 14, batch 28600, loss[loss=0.2113, simple_loss=0.2663, pruned_loss=0.05805, ctc_loss=0.1002, over 16566.00 frames. ], tot_loss[loss=0.2279, simple_loss=0.2925, pruned_loss=0.0602, ctc_loss=0.1072, over 3297206.37 frames. ], batch size: 110, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:17:07,312 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2862216.0, ans=0.125 2023-10-09 20:17:09,485 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2862216.0, ans=0.125 2023-10-09 20:17:51,196 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.42 vs. limit=15.0 2023-10-09 20:17:56,465 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=2862402.6666666665, ans=0.2 2023-10-09 20:18:05,179 INFO [train.py:1031] (2/4) Epoch 14, batch 28650, loss[loss=0.2519, simple_loss=0.3095, pruned_loss=0.07012, ctc_loss=0.135, over 16726.00 frames. ], tot_loss[loss=0.2249, simple_loss=0.2879, pruned_loss=0.05978, ctc_loss=0.1059, over 3294237.68 frames. ], batch size: 384, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:18:15,333 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2862449.3333333335, ans=0.0 2023-10-09 20:18:20,139 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2862496.0, ans=0.1 2023-10-09 20:18:27,685 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2862496.0, ans=0.125 2023-10-09 20:18:36,449 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2862542.6666666665, ans=0.1 2023-10-09 20:18:44,794 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2862589.3333333335, ans=0.125 2023-10-09 20:18:48,611 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2862589.3333333335, ans=0.0 2023-10-09 20:19:03,577 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2862636.0, ans=0.1 2023-10-09 20:19:05,965 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.314e+02 2.995e+02 3.402e+02 4.215e+02 9.672e+02, threshold=6.804e+02, percent-clipped=2.0 2023-10-09 20:19:07,091 INFO [train.py:1031] (2/4) Epoch 14, batch 28700, loss[loss=0.2292, simple_loss=0.3074, pruned_loss=0.05418, ctc_loss=0.1064, over 16790.00 frames. ], tot_loss[loss=0.2183, simple_loss=0.2835, pruned_loss=0.05644, ctc_loss=0.1006, over 3284643.05 frames. ], batch size: 309, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:19:15,128 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2862682.6666666665, ans=0.0 2023-10-09 20:19:31,991 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2862776.0, ans=0.125 2023-10-09 20:19:58,785 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2862869.3333333335, ans=0.1 2023-10-09 20:20:01,883 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2862869.3333333335, ans=0.125 2023-10-09 20:20:07,317 INFO [train.py:1031] (2/4) Epoch 14, batch 28750, loss[loss=0.2104, simple_loss=0.2694, pruned_loss=0.05708, ctc_loss=0.09314, over 16958.00 frames. ], tot_loss[loss=0.215, simple_loss=0.2802, pruned_loss=0.0552, ctc_loss=0.09843, over 3292269.17 frames. ], batch size: 90, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:20:11,430 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2862916.0, ans=0.125 2023-10-09 20:20:22,702 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2862962.6666666665, ans=0.1 2023-10-09 20:20:31,041 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2863009.3333333335, ans=0.125 2023-10-09 20:20:46,155 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2863056.0, ans=0.125 2023-10-09 20:20:58,194 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=2863102.6666666665, ans=0.2 2023-10-09 20:21:04,040 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2863102.6666666665, ans=0.07 2023-10-09 20:21:09,039 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.983e+02 3.101e+02 3.665e+02 4.221e+02 6.562e+02, threshold=7.330e+02, percent-clipped=0.0 2023-10-09 20:21:09,066 INFO [train.py:1031] (2/4) Epoch 14, batch 28800, loss[loss=0.2699, simple_loss=0.2929, pruned_loss=0.09237, ctc_loss=0.1551, over 16682.00 frames. ], tot_loss[loss=0.2179, simple_loss=0.28, pruned_loss=0.05747, ctc_loss=0.1022, over 3297542.19 frames. ], batch size: 416, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:21:13,590 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2863149.3333333335, ans=0.1 2023-10-09 20:21:15,517 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.00 vs. limit=6.0 2023-10-09 20:21:24,512 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=2863196.0, ans=10.0 2023-10-09 20:21:32,862 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2863242.6666666665, ans=0.125 2023-10-09 20:21:32,967 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2863242.6666666665, ans=0.125 2023-10-09 20:21:40,391 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2863242.6666666665, ans=0.125 2023-10-09 20:22:08,074 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=2863336.0, ans=0.0 2023-10-09 20:22:10,814 INFO [train.py:1031] (2/4) Epoch 14, batch 28850, loss[loss=0.2103, simple_loss=0.264, pruned_loss=0.05789, ctc_loss=0.1019, over 16786.00 frames. ], tot_loss[loss=0.2186, simple_loss=0.2772, pruned_loss=0.05914, ctc_loss=0.1047, over 3302242.36 frames. ], batch size: 272, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:22:22,157 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2863382.6666666665, ans=0.05 2023-10-09 20:22:22,596 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.84 vs. limit=10.0 2023-10-09 20:22:27,479 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2863429.3333333335, ans=0.125 2023-10-09 20:22:47,454 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.68 vs. limit=6.0 2023-10-09 20:22:57,600 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2863522.6666666665, ans=0.0 2023-10-09 20:23:12,086 INFO [train.py:1031] (2/4) Epoch 14, batch 28900, loss[loss=0.2741, simple_loss=0.306, pruned_loss=0.09051, ctc_loss=0.1527, over 16604.00 frames. ], tot_loss[loss=0.2182, simple_loss=0.2731, pruned_loss=0.06032, ctc_loss=0.1065, over 3302422.37 frames. ], batch size: 384, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:23:13,124 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.719e+02 3.415e+02 3.744e+02 4.568e+02 8.890e+02, threshold=7.488e+02, percent-clipped=1.0 2023-10-09 20:23:37,642 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2863709.3333333335, ans=0.0 2023-10-09 20:23:52,406 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2863756.0, ans=0.2 2023-10-09 20:23:54,612 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2863756.0, ans=0.125 2023-10-09 20:24:13,689 INFO [train.py:1031] (2/4) Epoch 14, batch 28950, loss[loss=0.2159, simple_loss=0.2808, pruned_loss=0.05603, ctc_loss=0.09731, over 16832.00 frames. ], tot_loss[loss=0.2164, simple_loss=0.2705, pruned_loss=0.06014, ctc_loss=0.1048, over 3297339.79 frames. ], batch size: 309, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:24:56,391 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2863989.3333333335, ans=0.125 2023-10-09 20:25:00,053 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.90 vs. limit=6.0 2023-10-09 20:25:01,138 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.38 vs. limit=15.0 2023-10-09 20:25:15,070 INFO [train.py:1031] (2/4) Epoch 14, batch 29000, loss[loss=0.194, simple_loss=0.2767, pruned_loss=0.04166, ctc_loss=0.07016, over 16942.00 frames. ], tot_loss[loss=0.2124, simple_loss=0.2675, pruned_loss=0.05844, ctc_loss=0.1011, over 3294790.38 frames. ], batch size: 258, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:25:17,200 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.221e+02 3.225e+02 3.785e+02 4.643e+02 9.976e+02, threshold=7.570e+02, percent-clipped=3.0 2023-10-09 20:25:32,075 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2864129.3333333335, ans=0.125 2023-10-09 20:26:07,412 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.11 vs. limit=12.0 2023-10-09 20:26:08,052 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2864269.3333333335, ans=0.0 2023-10-09 20:26:14,253 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2864316.0, ans=0.125 2023-10-09 20:26:15,127 INFO [train.py:1031] (2/4) Epoch 14, batch 29050, loss[loss=0.2505, simple_loss=0.309, pruned_loss=0.07191, ctc_loss=0.1205, over 16910.00 frames. ], tot_loss[loss=0.2134, simple_loss=0.2697, pruned_loss=0.05829, ctc_loss=0.101, over 3303530.36 frames. ], batch size: 82, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:27:03,698 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2864502.6666666665, ans=0.125 2023-10-09 20:27:16,985 INFO [train.py:1031] (2/4) Epoch 14, batch 29100, loss[loss=0.2502, simple_loss=0.2977, pruned_loss=0.07415, ctc_loss=0.1359, over 16897.00 frames. ], tot_loss[loss=0.2196, simple_loss=0.2734, pruned_loss=0.06163, ctc_loss=0.1066, over 3304699.20 frames. ], batch size: 292, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:27:17,290 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2864549.3333333335, ans=0.125 2023-10-09 20:27:20,251 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.727e+02 3.447e+02 3.769e+02 4.635e+02 6.729e+02, threshold=7.539e+02, percent-clipped=0.0 2023-10-09 20:27:41,224 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2864642.6666666665, ans=0.125 2023-10-09 20:27:50,318 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 20:27:55,743 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2864689.3333333335, ans=0.125 2023-10-09 20:28:03,716 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2864689.3333333335, ans=0.0 2023-10-09 20:28:16,899 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.54 vs. limit=15.0 2023-10-09 20:28:18,205 INFO [train.py:1031] (2/4) Epoch 14, batch 29150, loss[loss=0.2775, simple_loss=0.3052, pruned_loss=0.09183, ctc_loss=0.1654, over 16807.00 frames. ], tot_loss[loss=0.2255, simple_loss=0.2785, pruned_loss=0.06403, ctc_loss=0.1112, over 3298186.55 frames. ], batch size: 384, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:28:20,129 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2864782.6666666665, ans=0.09899494936611666 2023-10-09 20:28:36,707 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2864829.3333333335, ans=0.1 2023-10-09 20:28:42,110 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2864829.3333333335, ans=0.125 2023-10-09 20:28:51,930 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2864876.0, ans=0.1 2023-10-09 20:29:15,478 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2864969.3333333335, ans=0.0 2023-10-09 20:29:22,662 INFO [train.py:1031] (2/4) Epoch 14, batch 29200, loss[loss=0.244, simple_loss=0.3217, pruned_loss=0.06087, ctc_loss=0.1116, over 16466.00 frames. ], tot_loss[loss=0.2277, simple_loss=0.2815, pruned_loss=0.06446, ctc_loss=0.1124, over 3292855.59 frames. ], batch size: 416, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:29:28,272 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.765e+02 3.299e+02 3.814e+02 4.330e+02 6.435e+02, threshold=7.628e+02, percent-clipped=0.0 2023-10-09 20:29:31,432 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=4.41 vs. limit=15.0 2023-10-09 20:29:34,776 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2865062.6666666665, ans=0.0 2023-10-09 20:29:48,356 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2865109.3333333335, ans=0.125 2023-10-09 20:30:09,428 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2865156.0, ans=0.125 2023-10-09 20:30:24,793 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2865202.6666666665, ans=0.04949747468305833 2023-10-09 20:30:27,647 INFO [train.py:1031] (2/4) Epoch 14, batch 29250, loss[loss=0.216, simple_loss=0.2827, pruned_loss=0.05579, ctc_loss=0.09421, over 16790.00 frames. ], tot_loss[loss=0.2268, simple_loss=0.2842, pruned_loss=0.06277, ctc_loss=0.1099, over 3287195.73 frames. ], batch size: 176, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:30:28,041 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2865249.3333333335, ans=0.09899494936611666 2023-10-09 20:30:42,624 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2865296.0, ans=0.125 2023-10-09 20:30:50,424 INFO [scaling.py:979] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.86 vs. limit=8.0 2023-10-09 20:30:53,976 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2865342.6666666665, ans=0.1 2023-10-09 20:31:21,962 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2865436.0, ans=0.125 2023-10-09 20:31:23,471 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2865436.0, ans=0.0 2023-10-09 20:31:26,654 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=2865436.0, ans=0.2 2023-10-09 20:31:32,755 INFO [train.py:1031] (2/4) Epoch 14, batch 29300, loss[loss=0.2255, simple_loss=0.3022, pruned_loss=0.05565, ctc_loss=0.09401, over 16985.00 frames. ], tot_loss[loss=0.2335, simple_loss=0.2938, pruned_loss=0.06406, ctc_loss=0.1127, over 3297626.23 frames. ], batch size: 216, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:31:38,517 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.349e+02 3.153e+02 3.767e+02 4.679e+02 9.052e+02, threshold=7.535e+02, percent-clipped=4.0 2023-10-09 20:31:42,958 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.97 vs. limit=15.0 2023-10-09 20:31:48,139 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=10.67 vs. limit=22.5 2023-10-09 20:32:16,541 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2865622.6666666665, ans=0.5 2023-10-09 20:32:27,820 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2865669.3333333335, ans=0.125 2023-10-09 20:32:33,858 INFO [train.py:1031] (2/4) Epoch 14, batch 29350, loss[loss=0.1917, simple_loss=0.2546, pruned_loss=0.04691, ctc_loss=0.0872, over 16902.00 frames. ], tot_loss[loss=0.2335, simple_loss=0.2919, pruned_loss=0.06478, ctc_loss=0.1139, over 3302732.31 frames. ], batch size: 202, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:32:45,146 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2865762.6666666665, ans=0.09899494936611666 2023-10-09 20:33:00,322 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=2865809.3333333335, ans=0.5 2023-10-09 20:33:17,770 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2865856.0, ans=0.125 2023-10-09 20:33:26,736 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2865902.6666666665, ans=0.125 2023-10-09 20:33:36,259 INFO [train.py:1031] (2/4) Epoch 14, batch 29400, loss[loss=0.1799, simple_loss=0.2395, pruned_loss=0.04385, ctc_loss=0.08133, over 16644.00 frames. ], tot_loss[loss=0.2259, simple_loss=0.2841, pruned_loss=0.06197, ctc_loss=0.1092, over 3296011.25 frames. ], batch size: 151, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:33:44,077 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.204e+02 2.925e+02 3.429e+02 4.063e+02 7.311e+02, threshold=6.858e+02, percent-clipped=0.0 2023-10-09 20:33:47,355 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2865949.3333333335, ans=0.0 2023-10-09 20:34:06,637 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2866042.6666666665, ans=0.07 2023-10-09 20:34:40,000 INFO [train.py:1031] (2/4) Epoch 14, batch 29450, loss[loss=0.2134, simple_loss=0.291, pruned_loss=0.04864, ctc_loss=0.09621, over 16440.00 frames. ], tot_loss[loss=0.2175, simple_loss=0.2791, pruned_loss=0.05753, ctc_loss=0.1023, over 3299709.17 frames. ], batch size: 416, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:34:58,253 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2866229.3333333335, ans=0.125 2023-10-09 20:35:02,979 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2866229.3333333335, ans=0.1 2023-10-09 20:35:21,831 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2866322.6666666665, ans=0.2 2023-10-09 20:35:43,431 INFO [train.py:1031] (2/4) Epoch 14, batch 29500, loss[loss=0.2312, simple_loss=0.2941, pruned_loss=0.06158, ctc_loss=0.1129, over 16984.00 frames. ], tot_loss[loss=0.2161, simple_loss=0.2808, pruned_loss=0.05572, ctc_loss=0.1001, over 3297862.96 frames. ], batch size: 86, lr: 2.54e-03, grad_scale: 8.0 2023-10-09 20:35:51,252 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.027e+02 2.902e+02 3.659e+02 4.459e+02 8.520e+02, threshold=7.319e+02, percent-clipped=6.0 2023-10-09 20:35:57,483 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2866462.6666666665, ans=0.125 2023-10-09 20:36:17,186 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.36 vs. limit=15.0 2023-10-09 20:36:44,252 INFO [train.py:1031] (2/4) Epoch 14, batch 29550, loss[loss=0.2269, simple_loss=0.2542, pruned_loss=0.07318, ctc_loss=0.1332, over 16422.00 frames. ], tot_loss[loss=0.2145, simple_loss=0.2769, pruned_loss=0.05598, ctc_loss=0.1003, over 3301920.83 frames. ], batch size: 417, lr: 2.54e-03, grad_scale: 2.0 2023-10-09 20:36:46,598 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=2866649.3333333335, ans=10.0 2023-10-09 20:36:54,530 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=2866649.3333333335, ans=15.0 2023-10-09 20:36:55,338 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 20:36:55,357 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2866696.0, ans=0.125 2023-10-09 20:37:09,672 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2866742.6666666665, ans=0.0 2023-10-09 20:37:34,884 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2866836.0, ans=0.125 2023-10-09 20:37:44,951 INFO [train.py:1031] (2/4) Epoch 14, batch 29600, loss[loss=0.2391, simple_loss=0.3005, pruned_loss=0.06571, ctc_loss=0.1157, over 16892.00 frames. ], tot_loss[loss=0.2133, simple_loss=0.2727, pruned_loss=0.05664, ctc_loss=0.1013, over 3299899.26 frames. ], batch size: 258, lr: 2.54e-03, grad_scale: 4.0 2023-10-09 20:37:48,592 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2866882.6666666665, ans=0.1 2023-10-09 20:37:48,605 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=2866882.6666666665, ans=0.125 2023-10-09 20:37:54,765 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.339e+02 3.047e+02 3.582e+02 4.028e+02 6.950e+02, threshold=7.163e+02, percent-clipped=0.0 2023-10-09 20:38:07,002 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2866929.3333333335, ans=0.125 2023-10-09 20:38:25,078 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2867022.6666666665, ans=0.125 2023-10-09 20:38:46,701 INFO [train.py:1031] (2/4) Epoch 14, batch 29650, loss[loss=0.2275, simple_loss=0.2974, pruned_loss=0.05705, ctc_loss=0.1087, over 16867.00 frames. ], tot_loss[loss=0.2167, simple_loss=0.2768, pruned_loss=0.0577, ctc_loss=0.1031, over 3306863.54 frames. ], batch size: 243, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 20:38:59,393 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=2867162.6666666665, ans=0.0 2023-10-09 20:39:09,455 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2867162.6666666665, ans=0.2 2023-10-09 20:39:33,548 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2867256.0, ans=0.2 2023-10-09 20:39:38,653 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2023-10-09 20:39:48,383 INFO [train.py:1031] (2/4) Epoch 14, batch 29700, loss[loss=0.2254, simple_loss=0.289, pruned_loss=0.06118, ctc_loss=0.09856, over 16898.00 frames. ], tot_loss[loss=0.2219, simple_loss=0.2798, pruned_loss=0.06054, ctc_loss=0.1075, over 3306032.89 frames. ], batch size: 90, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 20:39:48,766 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2867349.3333333335, ans=0.09899494936611666 2023-10-09 20:39:59,295 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.566e+02 3.266e+02 3.794e+02 4.396e+02 1.319e+03, threshold=7.588e+02, percent-clipped=2.0 2023-10-09 20:40:01,341 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2867396.0, ans=0.0 2023-10-09 20:40:05,449 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2867396.0, ans=0.1 2023-10-09 20:40:16,719 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2867442.6666666665, ans=0.1 2023-10-09 20:40:50,195 INFO [train.py:1031] (2/4) Epoch 14, batch 29750, loss[loss=0.2671, simple_loss=0.3197, pruned_loss=0.07891, ctc_loss=0.1415, over 16997.00 frames. ], tot_loss[loss=0.2255, simple_loss=0.2813, pruned_loss=0.06266, ctc_loss=0.111, over 3314330.46 frames. ], batch size: 86, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 20:41:27,309 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.07 vs. limit=15.0 2023-10-09 20:41:30,811 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.78 vs. limit=15.0 2023-10-09 20:41:39,503 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2867769.3333333335, ans=0.125 2023-10-09 20:41:53,580 INFO [train.py:1031] (2/4) Epoch 14, batch 29800, loss[loss=0.2369, simple_loss=0.2805, pruned_loss=0.07126, ctc_loss=0.1268, over 16771.00 frames. ], tot_loss[loss=0.2273, simple_loss=0.2818, pruned_loss=0.06384, ctc_loss=0.1127, over 3318673.29 frames. ], batch size: 353, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 20:41:59,816 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2867816.0, ans=0.125 2023-10-09 20:42:05,752 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.683e+02 3.252e+02 3.750e+02 4.690e+02 1.156e+03, threshold=7.500e+02, percent-clipped=2.0 2023-10-09 20:42:12,144 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.85 vs. limit=15.0 2023-10-09 20:42:15,985 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2867862.6666666665, ans=0.1 2023-10-09 20:42:17,081 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2867909.3333333335, ans=0.125 2023-10-09 20:42:26,675 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 20:42:38,495 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.58 vs. limit=15.0 2023-10-09 20:42:56,943 INFO [train.py:1031] (2/4) Epoch 14, batch 29850, loss[loss=0.2487, simple_loss=0.2985, pruned_loss=0.07523, ctc_loss=0.1213, over 16638.00 frames. ], tot_loss[loss=0.2341, simple_loss=0.2912, pruned_loss=0.06541, ctc_loss=0.1156, over 3315787.85 frames. ], batch size: 111, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 20:43:08,337 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2868049.3333333335, ans=0.125 2023-10-09 20:43:09,596 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=17.41 vs. limit=15.0 2023-10-09 20:43:45,131 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2868189.3333333335, ans=0.2 2023-10-09 20:43:56,846 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2868236.0, ans=0.025 2023-10-09 20:44:02,050 INFO [train.py:1031] (2/4) Epoch 14, batch 29900, loss[loss=0.2571, simple_loss=0.3184, pruned_loss=0.07287, ctc_loss=0.1255, over 16855.00 frames. ], tot_loss[loss=0.2375, simple_loss=0.2938, pruned_loss=0.067, ctc_loss=0.1179, over 3305676.64 frames. ], batch size: 242, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 20:44:08,989 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2868282.6666666665, ans=0.0 2023-10-09 20:44:15,806 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.669e+02 3.520e+02 3.961e+02 4.963e+02 1.132e+03, threshold=7.922e+02, percent-clipped=8.0 2023-10-09 20:44:27,580 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2023-10-09 20:44:35,220 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2868376.0, ans=0.125 2023-10-09 20:44:49,644 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.82 vs. limit=15.0 2023-10-09 20:44:57,777 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.12 vs. limit=12.0 2023-10-09 20:44:59,105 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2868469.3333333335, ans=0.0 2023-10-09 20:45:04,794 INFO [train.py:1031] (2/4) Epoch 14, batch 29950, loss[loss=0.2743, simple_loss=0.3639, pruned_loss=0.06847, ctc_loss=0.1196, over 15179.00 frames. ], tot_loss[loss=0.2416, simple_loss=0.298, pruned_loss=0.06865, ctc_loss=0.1198, over 3293871.71 frames. ], batch size: 527, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 20:45:06,216 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2868516.0, ans=0.125 2023-10-09 20:45:12,362 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2868516.0, ans=0.1 2023-10-09 20:45:23,516 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn2.whiten.whitening_limit, batch_count=2868562.6666666665, ans=22.5 2023-10-09 20:45:24,148 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2868562.6666666665, ans=0.0 2023-10-09 20:45:24,249 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2868562.6666666665, ans=0.1 2023-10-09 20:45:33,905 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.15 vs. limit=12.0 2023-10-09 20:45:55,529 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.81 vs. limit=12.0 2023-10-09 20:46:01,597 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2868702.6666666665, ans=0.2 2023-10-09 20:46:05,484 INFO [train.py:1031] (2/4) Epoch 14, batch 30000, loss[loss=0.2498, simple_loss=0.3039, pruned_loss=0.07363, ctc_loss=0.121, over 16721.00 frames. ], tot_loss[loss=0.2397, simple_loss=0.2998, pruned_loss=0.06651, ctc_loss=0.1163, over 3295484.57 frames. ], batch size: 102, lr: 2.53e-03, grad_scale: 8.0 2023-10-09 20:46:05,484 INFO [train.py:1054] (2/4) Computing validation loss 2023-10-09 20:46:14,860 INFO [zipformer.py:1853] (2/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.0011, 2.3122, 4.4298, 1.6419], device='cuda:2') 2023-10-09 20:46:22,664 INFO [train.py:1063] (2/4) Epoch 14, validation: loss=0.2308, simple_loss=0.3022, pruned_loss=0.06118, ctc_loss=0.09249, over 1796401.00 frames. 2023-10-09 20:46:22,665 INFO [train.py:1064] (2/4) Maximum memory allocated so far is 14591MB 2023-10-09 20:46:25,431 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.27 vs. limit=22.5 2023-10-09 20:46:36,770 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.362e+02 3.188e+02 3.941e+02 4.902e+02 7.309e+02, threshold=7.881e+02, percent-clipped=0.0 2023-10-09 20:46:37,128 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2868796.0, ans=0.125 2023-10-09 20:46:38,205 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2868796.0, ans=0.0 2023-10-09 20:46:43,976 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.94 vs. limit=15.0 2023-10-09 20:46:48,234 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2868842.6666666665, ans=0.125 2023-10-09 20:46:58,600 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2023-10-09 20:47:24,789 INFO [train.py:1031] (2/4) Epoch 14, batch 30050, loss[loss=0.1889, simple_loss=0.2596, pruned_loss=0.04344, ctc_loss=0.07839, over 16662.00 frames. ], tot_loss[loss=0.2376, simple_loss=0.2975, pruned_loss=0.0658, ctc_loss=0.1153, over 3291360.32 frames. ], batch size: 151, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 20:47:25,288 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.18 vs. limit=22.5 2023-10-09 20:47:33,115 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2868982.6666666665, ans=0.125 2023-10-09 20:47:38,971 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=2869029.3333333335, ans=15.0 2023-10-09 20:47:53,972 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2869076.0, ans=0.1 2023-10-09 20:48:25,618 INFO [train.py:1031] (2/4) Epoch 14, batch 30100, loss[loss=0.2727, simple_loss=0.2668, pruned_loss=0.1055, ctc_loss=0.1689, over 10675.00 frames. ], tot_loss[loss=0.2348, simple_loss=0.2965, pruned_loss=0.06402, ctc_loss=0.1128, over 3278482.90 frames. ], batch size: 35, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 20:48:41,531 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.29 vs. limit=15.0 2023-10-09 20:48:43,029 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.464e+02 3.136e+02 3.710e+02 4.656e+02 9.667e+02, threshold=7.419e+02, percent-clipped=2.0 2023-10-09 20:48:43,695 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.00 vs. limit=15.0 2023-10-09 20:48:48,380 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 20:48:49,943 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2869309.3333333335, ans=0.0 2023-10-09 20:48:58,546 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2869309.3333333335, ans=0.125 2023-10-09 20:49:13,005 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2869356.0, ans=0.1 2023-10-09 20:49:15,965 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.01 vs. limit=22.5 2023-10-09 20:49:27,211 INFO [train.py:1031] (2/4) Epoch 14, batch 30150, loss[loss=0.2626, simple_loss=0.3198, pruned_loss=0.07641, ctc_loss=0.1315, over 16832.00 frames. ], tot_loss[loss=0.2339, simple_loss=0.2959, pruned_loss=0.06352, ctc_loss=0.1121, over 3277080.35 frames. ], batch size: 95, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 20:49:33,396 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2869449.3333333335, ans=0.125 2023-10-09 20:49:37,134 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2869449.3333333335, ans=0.1 2023-10-09 20:50:22,024 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2869636.0, ans=0.0 2023-10-09 20:50:27,444 INFO [train.py:1031] (2/4) Epoch 14, batch 30200, loss[loss=0.1729, simple_loss=0.244, pruned_loss=0.03781, ctc_loss=0.06544, over 10579.00 frames. ], tot_loss[loss=0.2377, simple_loss=0.2976, pruned_loss=0.06582, ctc_loss=0.1157, over 3275861.72 frames. ], batch size: 35, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 20:50:45,496 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.358e+02 3.166e+02 3.687e+02 4.321e+02 7.960e+02, threshold=7.375e+02, percent-clipped=2.0 2023-10-09 20:51:18,873 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2869869.3333333335, ans=0.125 2023-10-09 20:51:28,749 INFO [train.py:1031] (2/4) Epoch 14, batch 30250, loss[loss=0.2616, simple_loss=0.3246, pruned_loss=0.07254, ctc_loss=0.1338, over 16260.00 frames. ], tot_loss[loss=0.2397, simple_loss=0.2979, pruned_loss=0.06722, ctc_loss=0.1177, over 3271648.08 frames. ], batch size: 463, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 20:51:32,887 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2869916.0, ans=0.125 2023-10-09 20:51:35,648 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2869916.0, ans=0.125 2023-10-09 20:51:37,850 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=2869916.0, ans=0.125 2023-10-09 20:51:38,934 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=2869916.0, ans=0.07 2023-10-09 20:51:44,201 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.92 vs. limit=15.0 2023-10-09 20:51:48,001 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2869962.6666666665, ans=0.125 2023-10-09 20:51:49,945 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.96 vs. limit=15.0 2023-10-09 20:52:02,115 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=2870009.3333333335, ans=0.035 2023-10-09 20:52:06,535 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.13 vs. limit=15.0 2023-10-09 20:52:30,641 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2870102.6666666665, ans=0.125 2023-10-09 20:52:32,398 INFO [train.py:1031] (2/4) Epoch 14, batch 30300, loss[loss=0.2494, simple_loss=0.3045, pruned_loss=0.07229, ctc_loss=0.1242, over 16835.00 frames. ], tot_loss[loss=0.2451, simple_loss=0.3012, pruned_loss=0.07001, ctc_loss=0.1224, over 3279179.68 frames. ], batch size: 202, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 20:52:51,951 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.987e+02 3.410e+02 3.951e+02 4.930e+02 7.071e+02, threshold=7.902e+02, percent-clipped=0.0 2023-10-09 20:53:11,454 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2870289.3333333335, ans=0.1 2023-10-09 20:53:14,435 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.22 vs. limit=15.0 2023-10-09 20:53:33,821 INFO [train.py:1031] (2/4) Epoch 14, batch 30350, loss[loss=0.2327, simple_loss=0.281, pruned_loss=0.06959, ctc_loss=0.1133, over 16688.00 frames. ], tot_loss[loss=0.2444, simple_loss=0.2989, pruned_loss=0.07038, ctc_loss=0.1229, over 3274723.28 frames. ], batch size: 111, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 20:54:04,162 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=2870476.0, ans=0.09899494936611666 2023-10-09 20:54:04,496 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.85 vs. limit=15.0 2023-10-09 20:54:24,662 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2870569.3333333335, ans=0.125 2023-10-09 20:54:24,746 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2870569.3333333335, ans=0.0 2023-10-09 20:54:35,114 INFO [train.py:1031] (2/4) Epoch 14, batch 30400, loss[loss=0.2171, simple_loss=0.2704, pruned_loss=0.0609, ctc_loss=0.1048, over 16871.00 frames. ], tot_loss[loss=0.2424, simple_loss=0.295, pruned_loss=0.07037, ctc_loss=0.1228, over 3278788.77 frames. ], batch size: 121, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 20:54:35,333 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2870616.0, ans=0.0 2023-10-09 20:54:48,881 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2870662.6666666665, ans=0.1 2023-10-09 20:54:48,940 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2870662.6666666665, ans=0.1 2023-10-09 20:54:54,415 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.508e+02 3.255e+02 4.087e+02 4.759e+02 9.430e+02, threshold=8.174e+02, percent-clipped=1.0 2023-10-09 20:55:28,976 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2870802.6666666665, ans=0.125 2023-10-09 20:55:31,023 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2870802.6666666665, ans=0.125 2023-10-09 20:55:35,537 INFO [train.py:1031] (2/4) Epoch 14, batch 30450, loss[loss=0.1847, simple_loss=0.225, pruned_loss=0.05164, ctc_loss=0.1026, over 16069.00 frames. ], tot_loss[loss=0.2372, simple_loss=0.2878, pruned_loss=0.06916, ctc_loss=0.1209, over 3283847.89 frames. ], batch size: 463, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 20:56:00,494 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.60 vs. limit=15.0 2023-10-09 20:56:16,808 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2870989.3333333335, ans=0.125 2023-10-09 20:56:22,115 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.10 vs. limit=15.0 2023-10-09 20:56:28,123 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2871036.0, ans=0.1 2023-10-09 20:56:38,904 INFO [train.py:1031] (2/4) Epoch 14, batch 30500, loss[loss=0.2032, simple_loss=0.2659, pruned_loss=0.05255, ctc_loss=0.08856, over 16666.00 frames. ], tot_loss[loss=0.232, simple_loss=0.2835, pruned_loss=0.06689, ctc_loss=0.1168, over 3295598.57 frames. ], batch size: 130, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 20:56:39,758 INFO [scaling.py:979] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.77 vs. limit=5.0 2023-10-09 20:56:46,161 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2871082.6666666665, ans=0.125 2023-10-09 20:57:00,011 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.373e+02 3.155e+02 3.681e+02 4.532e+02 7.008e+02, threshold=7.361e+02, percent-clipped=0.0 2023-10-09 20:57:07,388 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2871176.0, ans=0.125 2023-10-09 20:57:16,339 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2871222.6666666665, ans=0.125 2023-10-09 20:57:30,073 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2871269.3333333335, ans=0.125 2023-10-09 20:57:37,091 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2871269.3333333335, ans=0.125 2023-10-09 20:57:41,471 INFO [train.py:1031] (2/4) Epoch 14, batch 30550, loss[loss=0.2247, simple_loss=0.2717, pruned_loss=0.06494, ctc_loss=0.1198, over 15374.00 frames. ], tot_loss[loss=0.2353, simple_loss=0.2905, pruned_loss=0.06665, ctc_loss=0.1168, over 3302532.86 frames. ], batch size: 526, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 20:58:14,101 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2871409.3333333335, ans=0.2 2023-10-09 20:58:18,611 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.05 vs. limit=6.0 2023-10-09 20:58:25,624 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2871456.0, ans=0.125 2023-10-09 20:58:38,660 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.85 vs. limit=10.0 2023-10-09 20:58:41,006 INFO [train.py:1031] (2/4) Epoch 14, batch 30600, loss[loss=0.2049, simple_loss=0.2598, pruned_loss=0.05667, ctc_loss=0.09192, over 16768.00 frames. ], tot_loss[loss=0.2356, simple_loss=0.2887, pruned_loss=0.06765, ctc_loss=0.1181, over 3308040.12 frames. ], batch size: 102, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 20:58:47,754 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2871549.3333333335, ans=0.125 2023-10-09 20:59:01,073 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.546e+02 3.194e+02 3.624e+02 4.233e+02 1.074e+03, threshold=7.249e+02, percent-clipped=2.0 2023-10-09 20:59:40,076 INFO [train.py:1031] (2/4) Epoch 14, batch 30650, loss[loss=0.1802, simple_loss=0.2439, pruned_loss=0.04286, ctc_loss=0.07704, over 16806.00 frames. ], tot_loss[loss=0.2297, simple_loss=0.2817, pruned_loss=0.06585, ctc_loss=0.1149, over 3299172.61 frames. ], batch size: 228, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 20:59:41,113 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2871782.6666666665, ans=0.07 2023-10-09 20:59:46,818 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2871782.6666666665, ans=0.0 2023-10-09 21:00:23,763 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2871922.6666666665, ans=0.125 2023-10-09 21:00:26,652 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2871922.6666666665, ans=0.0 2023-10-09 21:00:29,283 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2871969.3333333335, ans=0.0 2023-10-09 21:00:41,982 INFO [train.py:1031] (2/4) Epoch 14, batch 30700, loss[loss=0.1662, simple_loss=0.2217, pruned_loss=0.04183, ctc_loss=0.06751, over 16691.00 frames. ], tot_loss[loss=0.2219, simple_loss=0.2761, pruned_loss=0.06208, ctc_loss=0.1087, over 3295190.74 frames. ], batch size: 102, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:00:43,985 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2872016.0, ans=0.125 2023-10-09 21:00:52,952 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.75 vs. limit=12.0 2023-10-09 21:01:06,276 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.279e+02 3.086e+02 3.703e+02 4.369e+02 9.445e+02, threshold=7.405e+02, percent-clipped=1.0 2023-10-09 21:01:21,926 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2872156.0, ans=0.1 2023-10-09 21:01:43,127 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.19 vs. limit=10.0 2023-10-09 21:01:43,692 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2872202.6666666665, ans=0.2 2023-10-09 21:01:46,062 INFO [train.py:1031] (2/4) Epoch 14, batch 30750, loss[loss=0.2528, simple_loss=0.3172, pruned_loss=0.07138, ctc_loss=0.1142, over 16824.00 frames. ], tot_loss[loss=0.2229, simple_loss=0.2781, pruned_loss=0.06229, ctc_loss=0.1077, over 3289654.03 frames. ], batch size: 308, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:02:22,672 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2872342.6666666665, ans=0.0 2023-10-09 21:02:32,471 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.40 vs. limit=15.0 2023-10-09 21:02:36,768 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=2872436.0, ans=0.02 2023-10-09 21:02:50,793 INFO [train.py:1031] (2/4) Epoch 14, batch 30800, loss[loss=0.2256, simple_loss=0.3031, pruned_loss=0.05414, ctc_loss=0.09978, over 16859.00 frames. ], tot_loss[loss=0.2308, simple_loss=0.2881, pruned_loss=0.06442, ctc_loss=0.1116, over 3288452.36 frames. ], batch size: 202, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:03:16,581 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.601e+02 3.887e+02 4.535e+02 5.921e+02 9.056e+02, threshold=9.070e+02, percent-clipped=5.0 2023-10-09 21:03:42,491 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=2872669.3333333335, ans=15.0 2023-10-09 21:03:49,922 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2872669.3333333335, ans=0.2 2023-10-09 21:03:54,404 INFO [train.py:1031] (2/4) Epoch 14, batch 30850, loss[loss=0.2259, simple_loss=0.27, pruned_loss=0.06724, ctc_loss=0.1181, over 16759.00 frames. ], tot_loss[loss=0.2302, simple_loss=0.2872, pruned_loss=0.06429, ctc_loss=0.1117, over 3296587.12 frames. ], batch size: 292, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:03:59,561 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2872716.0, ans=0.125 2023-10-09 21:04:08,915 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=2872762.6666666665, ans=0.125 2023-10-09 21:04:17,638 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2872809.3333333335, ans=0.125 2023-10-09 21:04:30,252 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2872856.0, ans=0.125 2023-10-09 21:04:45,405 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2872902.6666666665, ans=0.04949747468305833 2023-10-09 21:04:55,239 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.28 vs. limit=15.0 2023-10-09 21:04:56,199 INFO [train.py:1031] (2/4) Epoch 14, batch 30900, loss[loss=0.1584, simple_loss=0.2296, pruned_loss=0.03116, ctc_loss=0.06199, over 16820.00 frames. ], tot_loss[loss=0.2238, simple_loss=0.2793, pruned_loss=0.06243, ctc_loss=0.1088, over 3290130.17 frames. ], batch size: 202, lr: 2.53e-03, grad_scale: 8.0 2023-10-09 21:05:00,353 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2872949.3333333335, ans=0.125 2023-10-09 21:05:14,417 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2872996.0, ans=0.125 2023-10-09 21:05:20,046 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.228e+02 3.130e+02 3.635e+02 4.208e+02 6.076e+02, threshold=7.270e+02, percent-clipped=0.0 2023-10-09 21:05:26,499 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.81 vs. limit=22.5 2023-10-09 21:05:36,337 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:05:56,103 INFO [train.py:1031] (2/4) Epoch 14, batch 30950, loss[loss=0.2967, simple_loss=0.3085, pruned_loss=0.1054, ctc_loss=0.1851, over 16840.00 frames. ], tot_loss[loss=0.2204, simple_loss=0.2752, pruned_loss=0.06138, ctc_loss=0.1073, over 3286376.64 frames. ], batch size: 384, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:05:59,353 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.57 vs. limit=15.0 2023-10-09 21:06:35,231 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2873322.6666666665, ans=0.125 2023-10-09 21:06:58,772 INFO [train.py:1031] (2/4) Epoch 14, batch 31000, loss[loss=0.1941, simple_loss=0.2303, pruned_loss=0.05891, ctc_loss=0.09999, over 10772.00 frames. ], tot_loss[loss=0.2223, simple_loss=0.2763, pruned_loss=0.06233, ctc_loss=0.1091, over 3288137.90 frames. ], batch size: 36, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:07:10,460 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2873462.6666666665, ans=0.0 2023-10-09 21:07:21,026 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=2873462.6666666665, ans=0.0 2023-10-09 21:07:25,521 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.598e+02 3.246e+02 3.902e+02 4.981e+02 7.271e+02, threshold=7.805e+02, percent-clipped=1.0 2023-10-09 21:07:30,746 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2873509.3333333335, ans=0.2 2023-10-09 21:07:32,138 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.44 vs. limit=22.5 2023-10-09 21:07:48,534 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.07 vs. limit=15.0 2023-10-09 21:07:58,592 INFO [train.py:1031] (2/4) Epoch 14, batch 31050, loss[loss=0.1649, simple_loss=0.2317, pruned_loss=0.03678, ctc_loss=0.06115, over 16744.00 frames. ], tot_loss[loss=0.2164, simple_loss=0.273, pruned_loss=0.05921, ctc_loss=0.1035, over 3287463.07 frames. ], batch size: 102, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:08:19,054 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2873696.0, ans=0.025 2023-10-09 21:08:19,336 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.79 vs. limit=15.0 2023-10-09 21:08:27,281 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2873742.6666666665, ans=0.125 2023-10-09 21:08:37,349 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2873789.3333333335, ans=0.0 2023-10-09 21:08:46,970 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2873836.0, ans=0.0 2023-10-09 21:08:47,011 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2873836.0, ans=0.125 2023-10-09 21:08:59,003 INFO [train.py:1031] (2/4) Epoch 14, batch 31100, loss[loss=0.1796, simple_loss=0.2448, pruned_loss=0.04256, ctc_loss=0.07314, over 16836.00 frames. ], tot_loss[loss=0.2122, simple_loss=0.2696, pruned_loss=0.05729, ctc_loss=0.1004, over 3288287.43 frames. ], batch size: 164, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:09:04,087 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2873882.6666666665, ans=0.0 2023-10-09 21:09:10,452 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2873929.3333333335, ans=0.125 2023-10-09 21:09:11,540 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2873929.3333333335, ans=0.1 2023-10-09 21:09:20,663 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:09:26,091 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+02 2.900e+02 3.232e+02 3.684e+02 6.119e+02, threshold=6.464e+02, percent-clipped=0.0 2023-10-09 21:09:33,674 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2874022.6666666665, ans=0.0 2023-10-09 21:09:38,472 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2874022.6666666665, ans=0.125 2023-10-09 21:09:57,959 INFO [train.py:1031] (2/4) Epoch 14, batch 31150, loss[loss=0.1626, simple_loss=0.2138, pruned_loss=0.0415, ctc_loss=0.07128, over 13287.00 frames. ], tot_loss[loss=0.2151, simple_loss=0.271, pruned_loss=0.05896, ctc_loss=0.1034, over 3291918.86 frames. ], batch size: 50, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:10:01,717 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=11.13 vs. limit=22.5 2023-10-09 21:10:03,645 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2874116.0, ans=0.0 2023-10-09 21:10:36,456 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2874256.0, ans=0.125 2023-10-09 21:10:38,659 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2874256.0, ans=0.2 2023-10-09 21:10:42,302 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2874256.0, ans=0.1 2023-10-09 21:10:45,592 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2874302.6666666665, ans=0.125 2023-10-09 21:10:48,794 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.84 vs. limit=15.0 2023-10-09 21:10:55,807 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2874302.6666666665, ans=0.2 2023-10-09 21:10:57,602 INFO [train.py:1031] (2/4) Epoch 14, batch 31200, loss[loss=0.234, simple_loss=0.2745, pruned_loss=0.07108, ctc_loss=0.1284, over 16744.00 frames. ], tot_loss[loss=0.2155, simple_loss=0.2708, pruned_loss=0.05937, ctc_loss=0.1039, over 3296597.43 frames. ], batch size: 309, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:11:06,437 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2874349.3333333335, ans=0.1 2023-10-09 21:11:27,522 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.397e+02 3.224e+02 3.763e+02 4.515e+02 7.909e+02, threshold=7.526e+02, percent-clipped=5.0 2023-10-09 21:11:40,643 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2874489.3333333335, ans=0.0 2023-10-09 21:11:40,913 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.72 vs. limit=15.0 2023-10-09 21:11:42,720 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2874489.3333333335, ans=0.07 2023-10-09 21:11:55,210 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2874536.0, ans=0.125 2023-10-09 21:11:58,148 INFO [train.py:1031] (2/4) Epoch 14, batch 31250, loss[loss=0.2165, simple_loss=0.2716, pruned_loss=0.05993, ctc_loss=0.104, over 16791.00 frames. ], tot_loss[loss=0.2138, simple_loss=0.2676, pruned_loss=0.05924, ctc_loss=0.1037, over 3287457.20 frames. ], batch size: 202, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:12:07,037 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2874582.6666666665, ans=0.125 2023-10-09 21:12:13,216 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.13 vs. limit=22.5 2023-10-09 21:12:27,026 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=12.88 vs. limit=22.5 2023-10-09 21:12:43,285 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2874722.6666666665, ans=0.125 2023-10-09 21:12:46,069 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2874722.6666666665, ans=0.0 2023-10-09 21:12:48,234 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2874769.3333333335, ans=0.0 2023-10-09 21:12:58,376 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=2874769.3333333335, ans=0.0 2023-10-09 21:13:01,891 INFO [train.py:1031] (2/4) Epoch 14, batch 31300, loss[loss=0.226, simple_loss=0.2643, pruned_loss=0.07142, ctc_loss=0.1123, over 16588.00 frames. ], tot_loss[loss=0.2127, simple_loss=0.2654, pruned_loss=0.05923, ctc_loss=0.1036, over 3295008.30 frames. ], batch size: 110, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:13:20,082 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2874862.6666666665, ans=0.125 2023-10-09 21:13:23,013 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2874862.6666666665, ans=0.2 2023-10-09 21:13:23,044 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2874862.6666666665, ans=0.125 2023-10-09 21:13:31,461 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2874909.3333333335, ans=0.125 2023-10-09 21:13:32,852 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.502e+02 3.069e+02 3.507e+02 3.932e+02 8.166e+02, threshold=7.015e+02, percent-clipped=1.0 2023-10-09 21:14:03,886 INFO [train.py:1031] (2/4) Epoch 14, batch 31350, loss[loss=0.1979, simple_loss=0.25, pruned_loss=0.05326, ctc_loss=0.09811, over 16718.00 frames. ], tot_loss[loss=0.2106, simple_loss=0.2619, pruned_loss=0.05908, ctc_loss=0.103, over 3293233.45 frames. ], batch size: 272, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:14:09,465 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2875049.3333333335, ans=0.0 2023-10-09 21:14:33,733 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2875142.6666666665, ans=0.125 2023-10-09 21:14:39,860 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.63 vs. limit=15.0 2023-10-09 21:14:40,772 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2875189.3333333335, ans=0.1 2023-10-09 21:14:41,955 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.88 vs. limit=15.0 2023-10-09 21:14:53,360 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.76 vs. limit=22.5 2023-10-09 21:15:02,105 INFO [train.py:1031] (2/4) Epoch 14, batch 31400, loss[loss=0.1733, simple_loss=0.2518, pruned_loss=0.03367, ctc_loss=0.06871, over 16731.00 frames. ], tot_loss[loss=0.2092, simple_loss=0.2597, pruned_loss=0.05886, ctc_loss=0.1025, over 3298677.53 frames. ], batch size: 272, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:15:19,982 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.99 vs. limit=15.0 2023-10-09 21:15:29,863 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2875376.0, ans=0.125 2023-10-09 21:15:31,000 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2875376.0, ans=0.125 2023-10-09 21:15:34,909 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.303e+02 3.143e+02 3.682e+02 4.471e+02 1.037e+03, threshold=7.364e+02, percent-clipped=4.0 2023-10-09 21:15:42,584 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2875422.6666666665, ans=0.125 2023-10-09 21:16:03,129 INFO [train.py:1031] (2/4) Epoch 14, batch 31450, loss[loss=0.1896, simple_loss=0.226, pruned_loss=0.05753, ctc_loss=0.09514, over 10865.00 frames. ], tot_loss[loss=0.2062, simple_loss=0.2574, pruned_loss=0.05746, ctc_loss=0.1003, over 3287295.43 frames. ], batch size: 35, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:16:56,427 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2875702.6666666665, ans=0.125 2023-10-09 21:16:56,437 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2875702.6666666665, ans=0.125 2023-10-09 21:16:57,597 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2875702.6666666665, ans=0.1 2023-10-09 21:17:05,436 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2875749.3333333335, ans=0.0 2023-10-09 21:17:06,131 INFO [train.py:1031] (2/4) Epoch 14, batch 31500, loss[loss=0.247, simple_loss=0.3026, pruned_loss=0.07143, ctc_loss=0.1214, over 16711.00 frames. ], tot_loss[loss=0.2073, simple_loss=0.2573, pruned_loss=0.05828, ctc_loss=0.1018, over 3290003.18 frames. ], batch size: 271, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:17:40,594 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.484e+02 3.158e+02 3.690e+02 4.602e+02 7.979e+02, threshold=7.380e+02, percent-clipped=2.0 2023-10-09 21:17:56,118 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=2875936.0, ans=0.125 2023-10-09 21:18:06,252 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.98 vs. limit=15.0 2023-10-09 21:18:09,320 INFO [train.py:1031] (2/4) Epoch 14, batch 31550, loss[loss=0.1992, simple_loss=0.2525, pruned_loss=0.05468, ctc_loss=0.09129, over 16927.00 frames. ], tot_loss[loss=0.2132, simple_loss=0.2642, pruned_loss=0.06013, ctc_loss=0.1046, over 3297928.22 frames. ], batch size: 86, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:18:16,652 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2875982.6666666665, ans=0.125 2023-10-09 21:18:31,296 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.65 vs. limit=15.0 2023-10-09 21:18:37,313 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=2876076.0, ans=0.125 2023-10-09 21:18:38,412 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=2876076.0, ans=0.125 2023-10-09 21:18:40,946 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2876076.0, ans=0.125 2023-10-09 21:18:45,894 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2876122.6666666665, ans=0.1 2023-10-09 21:18:46,175 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.97 vs. limit=10.0 2023-10-09 21:18:49,595 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2876122.6666666665, ans=0.5 2023-10-09 21:18:50,699 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=2876122.6666666665, ans=0.025 2023-10-09 21:18:52,468 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2876122.6666666665, ans=0.07 2023-10-09 21:19:09,419 INFO [train.py:1031] (2/4) Epoch 14, batch 31600, loss[loss=0.2182, simple_loss=0.2678, pruned_loss=0.06315, ctc_loss=0.1058, over 16769.00 frames. ], tot_loss[loss=0.2189, simple_loss=0.2705, pruned_loss=0.06209, ctc_loss=0.1079, over 3305301.43 frames. ], batch size: 188, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:19:10,698 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.67 vs. limit=15.0 2023-10-09 21:19:45,052 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.637e+02 3.252e+02 3.692e+02 4.282e+02 8.692e+02, threshold=7.384e+02, percent-clipped=4.0 2023-10-09 21:20:13,367 INFO [train.py:1031] (2/4) Epoch 14, batch 31650, loss[loss=0.2183, simple_loss=0.3071, pruned_loss=0.04789, ctc_loss=0.08437, over 16293.00 frames. ], tot_loss[loss=0.2195, simple_loss=0.2732, pruned_loss=0.06146, ctc_loss=0.107, over 3312129.91 frames. ], batch size: 463, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:20:22,923 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.56 vs. limit=10.0 2023-10-09 21:20:25,705 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2876496.0, ans=0.1 2023-10-09 21:20:47,054 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.31 vs. limit=12.0 2023-10-09 21:21:15,702 INFO [train.py:1031] (2/4) Epoch 14, batch 31700, loss[loss=0.2422, simple_loss=0.2884, pruned_loss=0.07177, ctc_loss=0.1313, over 15282.00 frames. ], tot_loss[loss=0.2191, simple_loss=0.2756, pruned_loss=0.06024, ctc_loss=0.1054, over 3319527.48 frames. ], batch size: 527, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:21:22,483 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2876682.6666666665, ans=0.1 2023-10-09 21:21:29,245 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2876729.3333333335, ans=0.1 2023-10-09 21:21:43,300 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.93 vs. limit=10.0 2023-10-09 21:21:47,493 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2876776.0, ans=0.1 2023-10-09 21:21:52,723 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.162e+02 3.115e+02 3.904e+02 4.739e+02 1.536e+03, threshold=7.807e+02, percent-clipped=3.0 2023-10-09 21:22:08,807 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.41 vs. limit=6.0 2023-10-09 21:22:10,685 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2876869.3333333335, ans=0.125 2023-10-09 21:22:18,292 INFO [train.py:1031] (2/4) Epoch 14, batch 31750, loss[loss=0.2265, simple_loss=0.2919, pruned_loss=0.05934, ctc_loss=0.1061, over 16833.00 frames. ], tot_loss[loss=0.2251, simple_loss=0.281, pruned_loss=0.0626, ctc_loss=0.1099, over 3316759.94 frames. ], batch size: 228, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:22:48,647 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2877009.3333333335, ans=0.1 2023-10-09 21:22:49,894 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2877009.3333333335, ans=0.0 2023-10-09 21:22:58,938 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=2877056.0, ans=0.04949747468305833 2023-10-09 21:23:06,128 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.59 vs. limit=15.0 2023-10-09 21:23:07,023 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2877102.6666666665, ans=0.2 2023-10-09 21:23:08,140 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2877102.6666666665, ans=0.0 2023-10-09 21:23:10,734 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2877102.6666666665, ans=0.0 2023-10-09 21:23:12,395 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2877102.6666666665, ans=0.0 2023-10-09 21:23:19,864 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2877149.3333333335, ans=0.05 2023-10-09 21:23:20,607 INFO [train.py:1031] (2/4) Epoch 14, batch 31800, loss[loss=0.1954, simple_loss=0.2584, pruned_loss=0.05, ctc_loss=0.08115, over 16901.00 frames. ], tot_loss[loss=0.2264, simple_loss=0.2813, pruned_loss=0.06344, ctc_loss=0.1116, over 3318739.34 frames. ], batch size: 95, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:23:36,529 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2877196.0, ans=0.0 2023-10-09 21:23:45,724 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2877242.6666666665, ans=0.04949747468305833 2023-10-09 21:23:55,508 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2877242.6666666665, ans=0.2 2023-10-09 21:23:56,617 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=2877289.3333333335, ans=0.125 2023-10-09 21:23:57,868 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.499e+02 3.284e+02 3.680e+02 4.274e+02 9.032e+02, threshold=7.360e+02, percent-clipped=1.0 2023-10-09 21:24:16,355 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2877336.0, ans=0.125 2023-10-09 21:24:22,030 INFO [train.py:1031] (2/4) Epoch 14, batch 31850, loss[loss=0.2104, simple_loss=0.2593, pruned_loss=0.05957, ctc_loss=0.1056, over 16812.00 frames. ], tot_loss[loss=0.227, simple_loss=0.28, pruned_loss=0.06438, ctc_loss=0.113, over 3319654.63 frames. ], batch size: 215, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:24:22,355 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2877382.6666666665, ans=0.0 2023-10-09 21:24:25,472 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2877382.6666666665, ans=0.0 2023-10-09 21:24:30,130 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.19 vs. limit=12.0 2023-10-09 21:24:35,773 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2877429.3333333335, ans=0.125 2023-10-09 21:24:40,160 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2877429.3333333335, ans=0.0 2023-10-09 21:24:48,974 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2877476.0, ans=0.1 2023-10-09 21:24:56,908 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2877522.6666666665, ans=0.125 2023-10-09 21:25:02,676 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.00 vs. limit=6.0 2023-10-09 21:25:23,224 INFO [train.py:1031] (2/4) Epoch 14, batch 31900, loss[loss=0.2021, simple_loss=0.2555, pruned_loss=0.05628, ctc_loss=0.09029, over 16914.00 frames. ], tot_loss[loss=0.223, simple_loss=0.2743, pruned_loss=0.0636, ctc_loss=0.1114, over 3310519.10 frames. ], batch size: 86, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:25:28,968 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.36 vs. limit=15.0 2023-10-09 21:25:31,339 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2877616.0, ans=0.125 2023-10-09 21:25:35,110 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2877662.6666666665, ans=0.04949747468305833 2023-10-09 21:25:50,146 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2877709.3333333335, ans=0.0 2023-10-09 21:25:51,302 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=2877709.3333333335, ans=0.125 2023-10-09 21:26:03,165 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.340e+02 3.194e+02 3.623e+02 4.182e+02 7.324e+02, threshold=7.246e+02, percent-clipped=0.0 2023-10-09 21:26:12,175 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2877802.6666666665, ans=0.07 2023-10-09 21:26:23,435 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=2877802.6666666665, ans=0.5 2023-10-09 21:26:25,780 INFO [train.py:1031] (2/4) Epoch 14, batch 31950, loss[loss=0.1752, simple_loss=0.2392, pruned_loss=0.0412, ctc_loss=0.07198, over 16777.00 frames. ], tot_loss[loss=0.215, simple_loss=0.2676, pruned_loss=0.06007, ctc_loss=0.1058, over 3316746.13 frames. ], batch size: 215, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:26:29,327 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2877849.3333333335, ans=0.1 2023-10-09 21:26:56,741 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.01 vs. limit=12.0 2023-10-09 21:27:03,059 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2877989.3333333335, ans=10.0 2023-10-09 21:27:12,088 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2877989.3333333335, ans=0.0 2023-10-09 21:27:20,266 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2878036.0, ans=0.2 2023-10-09 21:27:22,293 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2878036.0, ans=0.1 2023-10-09 21:27:26,617 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.52 vs. limit=15.0 2023-10-09 21:27:26,899 INFO [train.py:1031] (2/4) Epoch 14, batch 32000, loss[loss=0.2163, simple_loss=0.2755, pruned_loss=0.05827, ctc_loss=0.1016, over 16785.00 frames. ], tot_loss[loss=0.2122, simple_loss=0.2636, pruned_loss=0.05946, ctc_loss=0.1046, over 3305781.27 frames. ], batch size: 292, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:27:28,907 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2878082.6666666665, ans=0.125 2023-10-09 21:28:06,209 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2878222.6666666665, ans=0.0 2023-10-09 21:28:06,910 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.395e+02 3.047e+02 3.544e+02 4.263e+02 6.076e+02, threshold=7.087e+02, percent-clipped=0.0 2023-10-09 21:28:23,871 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2878269.3333333335, ans=0.125 2023-10-09 21:28:30,825 INFO [train.py:1031] (2/4) Epoch 14, batch 32050, loss[loss=0.231, simple_loss=0.3081, pruned_loss=0.05558, ctc_loss=0.1069, over 16802.00 frames. ], tot_loss[loss=0.2125, simple_loss=0.267, pruned_loss=0.05834, ctc_loss=0.1032, over 3300355.98 frames. ], batch size: 272, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:28:34,962 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.96 vs. limit=22.5 2023-10-09 21:28:35,484 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2878316.0, ans=0.125 2023-10-09 21:28:40,387 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.01 vs. limit=10.0 2023-10-09 21:29:20,408 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2878502.6666666665, ans=0.1 2023-10-09 21:29:22,401 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2878502.6666666665, ans=0.125 2023-10-09 21:29:31,433 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:29:33,753 INFO [train.py:1031] (2/4) Epoch 14, batch 32100, loss[loss=0.2119, simple_loss=0.2963, pruned_loss=0.04722, ctc_loss=0.08239, over 16804.00 frames. ], tot_loss[loss=0.2136, simple_loss=0.2727, pruned_loss=0.05707, ctc_loss=0.1011, over 3304446.65 frames. ], batch size: 272, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:30:13,059 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.315e+02 2.944e+02 3.415e+02 4.141e+02 9.202e+02, threshold=6.830e+02, percent-clipped=4.0 2023-10-09 21:30:13,356 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:30:14,485 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2878689.3333333335, ans=0.0 2023-10-09 21:30:32,512 INFO [train.py:1031] (2/4) Epoch 14, batch 32150, loss[loss=0.1985, simple_loss=0.2552, pruned_loss=0.05335, ctc_loss=0.08779, over 16876.00 frames. ], tot_loss[loss=0.2119, simple_loss=0.2729, pruned_loss=0.05575, ctc_loss=0.09824, over 3297532.72 frames. ], batch size: 90, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:30:40,984 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2878782.6666666665, ans=0.125 2023-10-09 21:30:53,663 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2878829.3333333335, ans=0.2 2023-10-09 21:31:33,112 INFO [train.py:1031] (2/4) Epoch 14, batch 32200, loss[loss=0.1969, simple_loss=0.2542, pruned_loss=0.05228, ctc_loss=0.08762, over 16040.00 frames. ], tot_loss[loss=0.2111, simple_loss=0.2691, pruned_loss=0.05668, ctc_loss=0.09958, over 3297689.12 frames. ], batch size: 70, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:31:36,585 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2879016.0, ans=0.125 2023-10-09 21:31:46,167 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.95 vs. limit=15.0 2023-10-09 21:32:05,547 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=2879109.3333333335, ans=0.125 2023-10-09 21:32:14,183 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.422e+02 3.053e+02 3.349e+02 3.952e+02 6.213e+02, threshold=6.698e+02, percent-clipped=0.0 2023-10-09 21:32:22,593 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2879202.6666666665, ans=0.125 2023-10-09 21:32:29,162 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=2879202.6666666665, ans=0.025 2023-10-09 21:32:32,675 INFO [train.py:1031] (2/4) Epoch 14, batch 32250, loss[loss=0.2288, simple_loss=0.2671, pruned_loss=0.07168, ctc_loss=0.1177, over 16713.00 frames. ], tot_loss[loss=0.2108, simple_loss=0.2659, pruned_loss=0.05771, ctc_loss=0.1009, over 3294040.20 frames. ], batch size: 272, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:32:41,506 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2879249.3333333335, ans=0.125 2023-10-09 21:32:43,342 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.11 vs. limit=15.0 2023-10-09 21:33:07,771 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2879389.3333333335, ans=0.1 2023-10-09 21:33:26,537 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.53 vs. limit=15.0 2023-10-09 21:33:33,810 INFO [train.py:1031] (2/4) Epoch 14, batch 32300, loss[loss=0.2141, simple_loss=0.2755, pruned_loss=0.05639, ctc_loss=0.09982, over 16804.00 frames. ], tot_loss[loss=0.2113, simple_loss=0.2641, pruned_loss=0.05873, ctc_loss=0.1027, over 3297844.87 frames. ], batch size: 176, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:34:00,053 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2879576.0, ans=0.0 2023-10-09 21:34:01,674 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2879576.0, ans=0.0 2023-10-09 21:34:08,294 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2879576.0, ans=0.125 2023-10-09 21:34:19,600 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.554e+02 3.404e+02 3.971e+02 4.753e+02 7.959e+02, threshold=7.942e+02, percent-clipped=3.0 2023-10-09 21:34:27,918 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2879669.3333333335, ans=0.125 2023-10-09 21:34:30,743 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2879669.3333333335, ans=0.0 2023-10-09 21:34:34,599 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=2879669.3333333335, ans=0.125 2023-10-09 21:34:39,151 INFO [train.py:1031] (2/4) Epoch 14, batch 32350, loss[loss=0.216, simple_loss=0.3453, pruned_loss=0.03087, ctc_loss=0.0624, over 15238.00 frames. ], tot_loss[loss=0.2181, simple_loss=0.2742, pruned_loss=0.0598, ctc_loss=0.106, over 3304260.21 frames. ], batch size: 527, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:34:56,422 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.29 vs. limit=10.0 2023-10-09 21:34:56,465 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.82 vs. limit=6.0 2023-10-09 21:34:59,371 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2879762.6666666665, ans=0.0 2023-10-09 21:35:08,317 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=2879809.3333333335, ans=0.2 2023-10-09 21:35:18,612 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:35:20,706 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2879856.0, ans=0.09899494936611666 2023-10-09 21:35:40,798 INFO [train.py:1031] (2/4) Epoch 14, batch 32400, loss[loss=0.1921, simple_loss=0.2602, pruned_loss=0.04558, ctc_loss=0.08218, over 16721.00 frames. ], tot_loss[loss=0.2188, simple_loss=0.2763, pruned_loss=0.05957, ctc_loss=0.1055, over 3305196.56 frames. ], batch size: 102, lr: 2.53e-03, grad_scale: 8.0 2023-10-09 21:35:41,116 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2879949.3333333335, ans=0.125 2023-10-09 21:35:43,188 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2879949.3333333335, ans=0.125 2023-10-09 21:36:04,081 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.95 vs. limit=15.0 2023-10-09 21:36:12,621 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=2880042.6666666665, ans=0.125 2023-10-09 21:36:13,659 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2880042.6666666665, ans=0.1 2023-10-09 21:36:15,561 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.08 vs. limit=22.5 2023-10-09 21:36:26,219 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.428e+02 3.219e+02 3.562e+02 4.144e+02 6.944e+02, threshold=7.124e+02, percent-clipped=0.0 2023-10-09 21:36:41,713 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:36:43,391 INFO [train.py:1031] (2/4) Epoch 14, batch 32450, loss[loss=0.2108, simple_loss=0.2574, pruned_loss=0.06033, ctc_loss=0.1089, over 16811.00 frames. ], tot_loss[loss=0.2193, simple_loss=0.2746, pruned_loss=0.06062, ctc_loss=0.107, over 3310932.37 frames. ], batch size: 188, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:36:47,360 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2880182.6666666665, ans=0.125 2023-10-09 21:36:48,976 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2880182.6666666665, ans=0.125 2023-10-09 21:36:51,089 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2880182.6666666665, ans=0.1 2023-10-09 21:37:08,154 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2880276.0, ans=0.1 2023-10-09 21:37:18,079 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.34 vs. limit=12.0 2023-10-09 21:37:24,338 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2880322.6666666665, ans=0.0 2023-10-09 21:37:24,416 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2880322.6666666665, ans=0.1 2023-10-09 21:37:33,382 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2880369.3333333335, ans=0.07 2023-10-09 21:37:39,434 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.34 vs. limit=15.0 2023-10-09 21:37:44,360 INFO [train.py:1031] (2/4) Epoch 14, batch 32500, loss[loss=0.1891, simple_loss=0.2415, pruned_loss=0.05081, ctc_loss=0.08769, over 16741.00 frames. ], tot_loss[loss=0.2166, simple_loss=0.2699, pruned_loss=0.06042, ctc_loss=0.1061, over 3319619.64 frames. ], batch size: 121, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:37:51,997 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.13 vs. limit=15.0 2023-10-09 21:38:16,677 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.59 vs. limit=15.0 2023-10-09 21:38:16,677 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=2880509.3333333335, ans=15.0 2023-10-09 21:38:17,319 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2880509.3333333335, ans=0.1 2023-10-09 21:38:26,343 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.88 vs. limit=15.0 2023-10-09 21:38:31,980 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.513e+02 2.966e+02 3.455e+02 3.936e+02 8.435e+02, threshold=6.910e+02, percent-clipped=1.0 2023-10-09 21:38:34,425 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2880602.6666666665, ans=0.0 2023-10-09 21:38:46,476 INFO [train.py:1031] (2/4) Epoch 14, batch 32550, loss[loss=0.1869, simple_loss=0.2446, pruned_loss=0.04665, ctc_loss=0.08964, over 16822.00 frames. ], tot_loss[loss=0.2062, simple_loss=0.2628, pruned_loss=0.05533, ctc_loss=0.09754, over 3317011.53 frames. ], batch size: 329, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:38:48,936 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2880649.3333333335, ans=0.0 2023-10-09 21:39:20,323 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2880742.6666666665, ans=0.0 2023-10-09 21:39:47,249 INFO [train.py:1031] (2/4) Epoch 14, batch 32600, loss[loss=0.1972, simple_loss=0.2483, pruned_loss=0.05538, ctc_loss=0.08848, over 16880.00 frames. ], tot_loss[loss=0.2035, simple_loss=0.2592, pruned_loss=0.05469, ctc_loss=0.09616, over 3314837.01 frames. ], batch size: 90, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:39:59,562 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=17.13 vs. limit=15.0 2023-10-09 21:40:12,642 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=2880976.0, ans=0.07 2023-10-09 21:40:15,618 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2880976.0, ans=0.125 2023-10-09 21:40:34,045 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.094e+02 2.902e+02 3.409e+02 5.024e+02 1.088e+03, threshold=6.817e+02, percent-clipped=5.0 2023-10-09 21:40:45,616 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=8.88 vs. limit=12.0 2023-10-09 21:40:48,728 INFO [train.py:1031] (2/4) Epoch 14, batch 32650, loss[loss=0.2496, simple_loss=0.3037, pruned_loss=0.07373, ctc_loss=0.1203, over 16843.00 frames. ], tot_loss[loss=0.2094, simple_loss=0.266, pruned_loss=0.05671, ctc_loss=0.09858, over 3289021.78 frames. ], batch size: 228, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:40:52,034 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.69 vs. limit=15.0 2023-10-09 21:41:26,131 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:41:40,853 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2881302.6666666665, ans=0.1 2023-10-09 21:41:48,096 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=2881302.6666666665, ans=10.0 2023-10-09 21:41:52,681 INFO [train.py:1031] (2/4) Epoch 14, batch 32700, loss[loss=0.2568, simple_loss=0.3167, pruned_loss=0.07196, ctc_loss=0.1324, over 16898.00 frames. ], tot_loss[loss=0.2207, simple_loss=0.2777, pruned_loss=0.06074, ctc_loss=0.1054, over 3289037.26 frames. ], batch size: 258, lr: 2.53e-03, grad_scale: 8.0 2023-10-09 21:42:41,824 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.809e+02 3.539e+02 4.014e+02 5.290e+02 1.076e+03, threshold=8.028e+02, percent-clipped=8.0 2023-10-09 21:42:47,540 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.56 vs. limit=22.5 2023-10-09 21:42:55,733 INFO [train.py:1031] (2/4) Epoch 14, batch 32750, loss[loss=0.2417, simple_loss=0.2986, pruned_loss=0.06778, ctc_loss=0.1233, over 16761.00 frames. ], tot_loss[loss=0.2282, simple_loss=0.2836, pruned_loss=0.06411, ctc_loss=0.1115, over 3286903.83 frames. ], batch size: 272, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:43:07,504 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=8.56 vs. limit=12.0 2023-10-09 21:43:17,702 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=2881629.3333333335, ans=0.125 2023-10-09 21:43:24,165 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2881676.0, ans=0.125 2023-10-09 21:43:27,505 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2881676.0, ans=0.0 2023-10-09 21:43:39,173 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.55 vs. limit=15.0 2023-10-09 21:43:50,964 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.83 vs. limit=15.0 2023-10-09 21:43:57,093 INFO [train.py:1031] (2/4) Epoch 14, batch 32800, loss[loss=0.215, simple_loss=0.2848, pruned_loss=0.05351, ctc_loss=0.09545, over 16819.00 frames. ], tot_loss[loss=0.229, simple_loss=0.2826, pruned_loss=0.06508, ctc_loss=0.1131, over 3287066.90 frames. ], batch size: 188, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:44:24,657 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2881909.3333333335, ans=0.125 2023-10-09 21:44:26,828 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2881909.3333333335, ans=0.1 2023-10-09 21:44:33,233 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2881956.0, ans=0.125 2023-10-09 21:44:36,086 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=2881956.0, ans=0.0 2023-10-09 21:44:46,106 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.522e+02 3.217e+02 3.697e+02 4.305e+02 8.023e+02, threshold=7.395e+02, percent-clipped=0.0 2023-10-09 21:44:57,217 INFO [train.py:1031] (2/4) Epoch 14, batch 32850, loss[loss=0.1819, simple_loss=0.2383, pruned_loss=0.04704, ctc_loss=0.07851, over 11530.00 frames. ], tot_loss[loss=0.2292, simple_loss=0.282, pruned_loss=0.06542, ctc_loss=0.114, over 3287532.29 frames. ], batch size: 35, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:45:08,209 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2882096.0, ans=0.125 2023-10-09 21:45:12,972 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2882096.0, ans=0.125 2023-10-09 21:45:24,022 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.05 vs. limit=15.0 2023-10-09 21:45:59,369 INFO [train.py:1031] (2/4) Epoch 14, batch 32900, loss[loss=0.1953, simple_loss=0.25, pruned_loss=0.0522, ctc_loss=0.09051, over 10922.00 frames. ], tot_loss[loss=0.2296, simple_loss=0.283, pruned_loss=0.06527, ctc_loss=0.1141, over 3295677.81 frames. ], batch size: 35, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:46:08,151 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2882282.6666666665, ans=0.0 2023-10-09 21:46:09,813 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2882282.6666666665, ans=0.125 2023-10-09 21:46:50,392 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2882469.3333333335, ans=0.0 2023-10-09 21:46:51,628 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.541e+02 3.233e+02 3.650e+02 4.547e+02 8.623e+02, threshold=7.299e+02, percent-clipped=2.0 2023-10-09 21:46:58,044 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2882469.3333333335, ans=0.0 2023-10-09 21:46:59,462 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.30 vs. limit=15.0 2023-10-09 21:47:02,670 INFO [train.py:1031] (2/4) Epoch 14, batch 32950, loss[loss=0.2872, simple_loss=0.3304, pruned_loss=0.08877, ctc_loss=0.166, over 16607.00 frames. ], tot_loss[loss=0.2338, simple_loss=0.2889, pruned_loss=0.06615, ctc_loss=0.1159, over 3298651.55 frames. ], batch size: 350, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:47:09,622 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:47:29,648 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2882609.3333333335, ans=0.125 2023-10-09 21:47:33,890 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.58 vs. limit=22.5 2023-10-09 21:47:38,528 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.33 vs. limit=22.5 2023-10-09 21:47:42,084 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=2882656.0, ans=0.025 2023-10-09 21:48:05,298 INFO [train.py:1031] (2/4) Epoch 14, batch 33000, loss[loss=0.2166, simple_loss=0.2453, pruned_loss=0.06901, ctc_loss=0.1247, over 15512.00 frames. ], tot_loss[loss=0.2375, simple_loss=0.2911, pruned_loss=0.06814, ctc_loss=0.119, over 3299960.21 frames. ], batch size: 526, lr: 2.53e-03, grad_scale: 8.0 2023-10-09 21:48:05,298 INFO [train.py:1054] (2/4) Computing validation loss 2023-10-09 21:48:16,129 INFO [zipformer.py:1853] (2/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.8506, 5.7140, 5.4489, 5.9762], device='cuda:2') 2023-10-09 21:48:23,065 INFO [train.py:1063] (2/4) Epoch 14, validation: loss=0.2327, simple_loss=0.3031, pruned_loss=0.06268, ctc_loss=0.09218, over 1796401.00 frames. 2023-10-09 21:48:23,065 INFO [train.py:1064] (2/4) Maximum memory allocated so far is 14591MB 2023-10-09 21:48:32,472 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.31 vs. limit=22.5 2023-10-09 21:48:42,528 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2882796.0, ans=0.0 2023-10-09 21:48:46,272 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:48:50,969 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2882842.6666666665, ans=0.1 2023-10-09 21:48:51,042 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2882842.6666666665, ans=0.1 2023-10-09 21:49:05,006 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2882889.3333333335, ans=0.125 2023-10-09 21:49:13,415 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.428e+02 3.435e+02 3.950e+02 5.096e+02 8.924e+02, threshold=7.899e+02, percent-clipped=1.0 2023-10-09 21:49:13,707 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:49:24,081 INFO [train.py:1031] (2/4) Epoch 14, batch 33050, loss[loss=0.2199, simple_loss=0.2675, pruned_loss=0.06386, ctc_loss=0.1114, over 16539.00 frames. ], tot_loss[loss=0.2355, simple_loss=0.2882, pruned_loss=0.06778, ctc_loss=0.1183, over 3303068.14 frames. ], batch size: 466, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:49:38,220 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=2883029.3333333335, ans=0.0 2023-10-09 21:49:40,432 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2883029.3333333335, ans=0.1 2023-10-09 21:49:45,726 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2883029.3333333335, ans=0.2 2023-10-09 21:50:00,019 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.70 vs. limit=10.0 2023-10-09 21:50:13,104 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2883169.3333333335, ans=0.0 2023-10-09 21:50:25,683 INFO [train.py:1031] (2/4) Epoch 14, batch 33100, loss[loss=0.2433, simple_loss=0.278, pruned_loss=0.07686, ctc_loss=0.1373, over 16728.00 frames. ], tot_loss[loss=0.2343, simple_loss=0.2856, pruned_loss=0.06783, ctc_loss=0.1186, over 3311024.14 frames. ], batch size: 130, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:50:31,334 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.91 vs. limit=15.0 2023-10-09 21:50:58,932 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2883309.3333333335, ans=0.125 2023-10-09 21:51:07,694 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2883356.0, ans=0.125 2023-10-09 21:51:18,564 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.227e+02 3.087e+02 3.637e+02 4.211e+02 8.906e+02, threshold=7.275e+02, percent-clipped=1.0 2023-10-09 21:51:22,344 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2883402.6666666665, ans=0.125 2023-10-09 21:51:28,125 INFO [train.py:1031] (2/4) Epoch 14, batch 33150, loss[loss=0.2041, simple_loss=0.2856, pruned_loss=0.04561, ctc_loss=0.07856, over 16861.00 frames. ], tot_loss[loss=0.2272, simple_loss=0.2808, pruned_loss=0.06429, ctc_loss=0.1126, over 3317295.90 frames. ], batch size: 228, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:51:33,509 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2883449.3333333335, ans=0.125 2023-10-09 21:51:33,555 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2883449.3333333335, ans=0.125 2023-10-09 21:51:35,189 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2883449.3333333335, ans=0.2 2023-10-09 21:51:42,762 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2883496.0, ans=0.125 2023-10-09 21:51:44,901 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2883496.0, ans=0.0 2023-10-09 21:51:54,658 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2883542.6666666665, ans=0.035 2023-10-09 21:52:04,627 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2883542.6666666665, ans=0.1 2023-10-09 21:52:07,414 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2883589.3333333335, ans=0.2 2023-10-09 21:52:12,539 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.01 vs. limit=22.5 2023-10-09 21:52:31,981 INFO [train.py:1031] (2/4) Epoch 14, batch 33200, loss[loss=0.225, simple_loss=0.2833, pruned_loss=0.05986, ctc_loss=0.1174, over 15289.00 frames. ], tot_loss[loss=0.2243, simple_loss=0.2796, pruned_loss=0.06241, ctc_loss=0.1103, over 3317190.62 frames. ], batch size: 527, lr: 2.53e-03, grad_scale: 8.0 2023-10-09 21:52:49,159 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2883729.3333333335, ans=0.125 2023-10-09 21:52:51,317 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2883729.3333333335, ans=0.0 2023-10-09 21:53:25,119 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.492e+02 3.107e+02 3.465e+02 4.067e+02 6.400e+02, threshold=6.930e+02, percent-clipped=0.0 2023-10-09 21:53:32,624 INFO [train.py:1031] (2/4) Epoch 14, batch 33250, loss[loss=0.1906, simple_loss=0.2411, pruned_loss=0.05266, ctc_loss=0.08705, over 16742.00 frames. ], tot_loss[loss=0.2221, simple_loss=0.2752, pruned_loss=0.06248, ctc_loss=0.1102, over 3304706.71 frames. ], batch size: 188, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:53:46,360 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=2883962.6666666665, ans=15.0 2023-10-09 21:54:01,397 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.22 vs. limit=15.0 2023-10-09 21:54:17,609 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.78 vs. limit=15.0 2023-10-09 21:54:22,464 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2884102.6666666665, ans=0.125 2023-10-09 21:54:35,047 INFO [train.py:1031] (2/4) Epoch 14, batch 33300, loss[loss=0.1928, simple_loss=0.2507, pruned_loss=0.04991, ctc_loss=0.08748, over 16922.00 frames. ], tot_loss[loss=0.2188, simple_loss=0.2698, pruned_loss=0.06206, ctc_loss=0.1091, over 3306695.79 frames. ], batch size: 90, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:54:35,656 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.57 vs. limit=15.0 2023-10-09 21:54:50,762 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2884196.0, ans=0.015 2023-10-09 21:55:08,675 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.06 vs. limit=15.0 2023-10-09 21:55:14,242 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.79 vs. limit=10.0 2023-10-09 21:55:20,631 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2884289.3333333335, ans=0.125 2023-10-09 21:55:32,063 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.275e+02 3.141e+02 3.663e+02 4.502e+02 8.687e+02, threshold=7.326e+02, percent-clipped=2.0 2023-10-09 21:55:38,479 INFO [train.py:1031] (2/4) Epoch 14, batch 33350, loss[loss=0.2258, simple_loss=0.2881, pruned_loss=0.06104, ctc_loss=0.1034, over 16768.00 frames. ], tot_loss[loss=0.2209, simple_loss=0.2732, pruned_loss=0.06228, ctc_loss=0.1098, over 3302500.16 frames. ], batch size: 130, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:55:42,886 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=2884382.6666666665, ans=15.0 2023-10-09 21:55:48,522 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.65 vs. limit=15.0 2023-10-09 21:55:55,893 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2884429.3333333335, ans=0.125 2023-10-09 21:55:56,948 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=2884429.3333333335, ans=0.0 2023-10-09 21:56:18,466 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2884522.6666666665, ans=0.125 2023-10-09 21:56:32,336 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2884569.3333333335, ans=0.125 2023-10-09 21:56:37,169 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:56:39,474 INFO [train.py:1031] (2/4) Epoch 14, batch 33400, loss[loss=0.2279, simple_loss=0.2752, pruned_loss=0.06813, ctc_loss=0.1108, over 16766.00 frames. ], tot_loss[loss=0.2241, simple_loss=0.2778, pruned_loss=0.063, ctc_loss=0.1111, over 3304557.65 frames. ], batch size: 130, lr: 2.53e-03, grad_scale: 8.0 2023-10-09 21:56:50,330 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=2884662.6666666665, ans=0.0 2023-10-09 21:56:54,155 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2884662.6666666665, ans=0.125 2023-10-09 21:56:57,013 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 21:57:03,595 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2884709.3333333335, ans=0.025 2023-10-09 21:57:13,296 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2884709.3333333335, ans=0.0 2023-10-09 21:57:34,883 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2884802.6666666665, ans=0.2 2023-10-09 21:57:36,674 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.586e+02 3.309e+02 3.821e+02 4.723e+02 1.099e+03, threshold=7.641e+02, percent-clipped=5.0 2023-10-09 21:57:42,140 INFO [train.py:1031] (2/4) Epoch 14, batch 33450, loss[loss=0.2536, simple_loss=0.3459, pruned_loss=0.06033, ctc_loss=0.1018, over 15121.00 frames. ], tot_loss[loss=0.2269, simple_loss=0.2815, pruned_loss=0.06378, ctc_loss=0.1118, over 3308480.14 frames. ], batch size: 527, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 21:57:46,614 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2884849.3333333335, ans=0.125 2023-10-09 21:58:46,684 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2885082.6666666665, ans=0.125 2023-10-09 21:58:47,410 INFO [train.py:1031] (2/4) Epoch 14, batch 33500, loss[loss=0.2395, simple_loss=0.2959, pruned_loss=0.06744, ctc_loss=0.1205, over 16964.00 frames. ], tot_loss[loss=0.2268, simple_loss=0.2822, pruned_loss=0.06359, ctc_loss=0.1105, over 3296568.63 frames. ], batch size: 82, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:58:59,340 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2885129.3333333335, ans=0.1 2023-10-09 21:59:05,858 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2885129.3333333335, ans=0.125 2023-10-09 21:59:11,744 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2885176.0, ans=0.1 2023-10-09 21:59:24,989 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2885222.6666666665, ans=0.0 2023-10-09 21:59:33,991 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2885222.6666666665, ans=0.0 2023-10-09 21:59:43,487 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.13 vs. limit=22.5 2023-10-09 21:59:45,442 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2885269.3333333335, ans=0.1 2023-10-09 21:59:46,063 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.546e+02 3.525e+02 4.202e+02 5.122e+02 8.777e+02, threshold=8.403e+02, percent-clipped=5.0 2023-10-09 21:59:48,886 INFO [train.py:1031] (2/4) Epoch 14, batch 33550, loss[loss=0.2269, simple_loss=0.2748, pruned_loss=0.06692, ctc_loss=0.1131, over 16930.00 frames. ], tot_loss[loss=0.2241, simple_loss=0.278, pruned_loss=0.06314, ctc_loss=0.1095, over 3297732.71 frames. ], batch size: 82, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 21:59:59,318 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2885316.0, ans=0.0 2023-10-09 22:00:12,774 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2885409.3333333335, ans=0.1 2023-10-09 22:00:24,470 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2885456.0, ans=0.125 2023-10-09 22:00:49,676 INFO [train.py:1031] (2/4) Epoch 14, batch 33600, loss[loss=0.1989, simple_loss=0.2546, pruned_loss=0.05388, ctc_loss=0.0889, over 16787.00 frames. ], tot_loss[loss=0.2204, simple_loss=0.2722, pruned_loss=0.06262, ctc_loss=0.1087, over 3301913.71 frames. ], batch size: 95, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 22:00:51,928 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2885549.3333333335, ans=0.1 2023-10-09 22:00:58,734 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2885549.3333333335, ans=0.125 2023-10-09 22:01:20,274 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2885642.6666666665, ans=0.2 2023-10-09 22:01:21,630 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.69 vs. limit=12.0 2023-10-09 22:01:21,630 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.whiten.whitening_limit, batch_count=2885642.6666666665, ans=12.0 2023-10-09 22:01:32,626 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.54 vs. limit=15.0 2023-10-09 22:01:48,017 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.338e+02 3.170e+02 3.773e+02 4.552e+02 1.576e+03, threshold=7.545e+02, percent-clipped=1.0 2023-10-09 22:01:49,717 INFO [train.py:1031] (2/4) Epoch 14, batch 33650, loss[loss=0.2263, simple_loss=0.2791, pruned_loss=0.06479, ctc_loss=0.1096, over 16223.00 frames. ], tot_loss[loss=0.2182, simple_loss=0.2679, pruned_loss=0.06253, ctc_loss=0.1085, over 3301870.23 frames. ], batch size: 71, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 22:02:07,429 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2885829.3333333335, ans=0.125 2023-10-09 22:02:51,303 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=2886016.0, ans=0.0 2023-10-09 22:02:52,472 INFO [train.py:1031] (2/4) Epoch 14, batch 33700, loss[loss=0.2519, simple_loss=0.3007, pruned_loss=0.07655, ctc_loss=0.125, over 16542.00 frames. ], tot_loss[loss=0.2251, simple_loss=0.2736, pruned_loss=0.06557, ctc_loss=0.1137, over 3301240.51 frames. ], batch size: 110, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 22:03:02,064 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2886016.0, ans=0.1 2023-10-09 22:03:10,259 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2886062.6666666665, ans=0.1 2023-10-09 22:03:20,174 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2886109.3333333335, ans=0.0 2023-10-09 22:03:22,955 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 22:03:29,743 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2886156.0, ans=0.0 2023-10-09 22:03:31,354 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2886156.0, ans=0.125 2023-10-09 22:03:48,436 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2886202.6666666665, ans=0.0 2023-10-09 22:03:52,830 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.688e+02 3.271e+02 3.899e+02 4.405e+02 9.865e+02, threshold=7.797e+02, percent-clipped=1.0 2023-10-09 22:03:52,857 INFO [train.py:1031] (2/4) Epoch 14, batch 33750, loss[loss=0.2201, simple_loss=0.272, pruned_loss=0.06242, ctc_loss=0.1082, over 16977.00 frames. ], tot_loss[loss=0.2289, simple_loss=0.2767, pruned_loss=0.06728, ctc_loss=0.1163, over 3302289.65 frames. ], batch size: 243, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 22:03:53,215 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=2886249.3333333335, ans=0.05 2023-10-09 22:04:02,114 INFO [scaling.py:979] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.56 vs. limit=5.0 2023-10-09 22:04:17,325 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2886342.6666666665, ans=0.125 2023-10-09 22:04:42,408 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.92 vs. limit=15.0 2023-10-09 22:04:44,235 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2886436.0, ans=0.1 2023-10-09 22:04:54,308 INFO [train.py:1031] (2/4) Epoch 14, batch 33800, loss[loss=0.22, simple_loss=0.2636, pruned_loss=0.06562, ctc_loss=0.1129, over 16766.00 frames. ], tot_loss[loss=0.2296, simple_loss=0.2774, pruned_loss=0.06752, ctc_loss=0.1168, over 3304913.59 frames. ], batch size: 292, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 22:04:56,346 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2886482.6666666665, ans=0.1 2023-10-09 22:04:57,460 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2886482.6666666665, ans=0.0 2023-10-09 22:05:03,376 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2886482.6666666665, ans=0.125 2023-10-09 22:05:04,732 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.88 vs. limit=15.0 2023-10-09 22:05:12,041 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2886529.3333333335, ans=0.0 2023-10-09 22:05:46,703 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2886669.3333333335, ans=0.1 2023-10-09 22:05:55,362 INFO [train.py:1031] (2/4) Epoch 14, batch 33850, loss[loss=0.2275, simple_loss=0.2662, pruned_loss=0.06986, ctc_loss=0.1229, over 16600.00 frames. ], tot_loss[loss=0.2247, simple_loss=0.272, pruned_loss=0.06588, ctc_loss=0.1142, over 3299431.96 frames. ], batch size: 110, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 22:05:56,420 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.529e+02 3.178e+02 3.599e+02 4.092e+02 7.716e+02, threshold=7.198e+02, percent-clipped=0.0 2023-10-09 22:06:05,303 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.44 vs. limit=15.0 2023-10-09 22:06:33,548 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2886856.0, ans=0.125 2023-10-09 22:06:53,843 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2886902.6666666665, ans=0.05 2023-10-09 22:06:56,613 INFO [train.py:1031] (2/4) Epoch 14, batch 33900, loss[loss=0.2782, simple_loss=0.3417, pruned_loss=0.08038, ctc_loss=0.1347, over 16454.00 frames. ], tot_loss[loss=0.2256, simple_loss=0.2733, pruned_loss=0.06608, ctc_loss=0.1144, over 3276969.87 frames. ], batch size: 416, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 22:07:06,864 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2886949.3333333335, ans=0.0 2023-10-09 22:07:07,054 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.03 vs. limit=15.0 2023-10-09 22:07:55,178 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.73 vs. limit=12.0 2023-10-09 22:07:59,530 INFO [train.py:1031] (2/4) Epoch 14, batch 33950, loss[loss=0.2158, simple_loss=0.2619, pruned_loss=0.06399, ctc_loss=0.1042, over 16545.00 frames. ], tot_loss[loss=0.2248, simple_loss=0.28, pruned_loss=0.06285, ctc_loss=0.1095, over 3288576.17 frames. ], batch size: 110, lr: 2.53e-03, grad_scale: 1.0 2023-10-09 22:08:03,405 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.763e+02 3.464e+02 4.205e+02 4.959e+02 7.578e+02, threshold=8.409e+02, percent-clipped=4.0 2023-10-09 22:08:03,703 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2887182.6666666665, ans=0.125 2023-10-09 22:08:50,185 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2887369.3333333335, ans=0.0 2023-10-09 22:08:57,166 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2887369.3333333335, ans=0.0 2023-10-09 22:08:58,502 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.02 vs. limit=15.0 2023-10-09 22:09:02,843 INFO [train.py:1031] (2/4) Epoch 14, batch 34000, loss[loss=0.2667, simple_loss=0.3788, pruned_loss=0.05523, ctc_loss=0.1106, over 16446.00 frames. ], tot_loss[loss=0.2336, simple_loss=0.2977, pruned_loss=0.06249, ctc_loss=0.1114, over 3290227.21 frames. ], batch size: 416, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 22:09:29,771 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2887509.3333333335, ans=0.0 2023-10-09 22:09:32,590 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2887509.3333333335, ans=0.035 2023-10-09 22:09:34,858 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2887509.3333333335, ans=0.2 2023-10-09 22:10:03,846 INFO [train.py:1031] (2/4) Epoch 14, batch 34050, loss[loss=0.2071, simple_loss=0.26, pruned_loss=0.05759, ctc_loss=0.09759, over 16725.00 frames. ], tot_loss[loss=0.2288, simple_loss=0.2944, pruned_loss=0.06009, ctc_loss=0.1077, over 3286539.44 frames. ], batch size: 140, lr: 2.53e-03, grad_scale: 1.0 2023-10-09 22:10:08,602 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.373e+02 3.098e+02 3.845e+02 4.884e+02 8.519e+02, threshold=7.690e+02, percent-clipped=1.0 2023-10-09 22:10:32,490 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.51 vs. limit=10.0 2023-10-09 22:10:35,054 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=2887742.6666666665, ans=15.0 2023-10-09 22:10:42,668 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.37 vs. limit=15.0 2023-10-09 22:10:48,528 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2887789.3333333335, ans=0.125 2023-10-09 22:10:51,671 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2887836.0, ans=0.0 2023-10-09 22:10:56,146 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=2887836.0, ans=15.0 2023-10-09 22:11:04,697 INFO [train.py:1031] (2/4) Epoch 14, batch 34100, loss[loss=0.2429, simple_loss=0.2918, pruned_loss=0.07344, ctc_loss=0.118, over 16751.00 frames. ], tot_loss[loss=0.2282, simple_loss=0.2913, pruned_loss=0.06086, ctc_loss=0.1085, over 3296420.18 frames. ], batch size: 102, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 22:11:17,698 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2887929.3333333335, ans=0.09899494936611666 2023-10-09 22:11:35,966 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=2887976.0, ans=0.2 2023-10-09 22:11:39,632 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=2887976.0, ans=0.1 2023-10-09 22:11:40,816 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2888022.6666666665, ans=0.125 2023-10-09 22:11:41,792 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2888022.6666666665, ans=0.0 2023-10-09 22:11:47,207 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2888022.6666666665, ans=0.1 2023-10-09 22:11:52,278 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2888022.6666666665, ans=0.0 2023-10-09 22:12:04,593 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.00 vs. limit=15.0 2023-10-09 22:12:05,995 INFO [train.py:1031] (2/4) Epoch 14, batch 34150, loss[loss=0.2577, simple_loss=0.3068, pruned_loss=0.0781, ctc_loss=0.131, over 16737.00 frames. ], tot_loss[loss=0.2324, simple_loss=0.293, pruned_loss=0.06342, ctc_loss=0.1125, over 3302832.65 frames. ], batch size: 102, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 22:12:11,412 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.572e+02 3.257e+02 3.702e+02 4.193e+02 7.598e+02, threshold=7.404e+02, percent-clipped=0.0 2023-10-09 22:12:31,853 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2888209.3333333335, ans=0.125 2023-10-09 22:13:02,601 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2888302.6666666665, ans=0.125 2023-10-09 22:13:08,609 INFO [train.py:1031] (2/4) Epoch 14, batch 34200, loss[loss=0.2115, simple_loss=0.2575, pruned_loss=0.06061, ctc_loss=0.111, over 16786.00 frames. ], tot_loss[loss=0.2328, simple_loss=0.29, pruned_loss=0.0649, ctc_loss=0.1146, over 3308827.15 frames. ], batch size: 329, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 22:13:19,605 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2888396.0, ans=0.125 2023-10-09 22:13:19,658 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2888396.0, ans=0.2 2023-10-09 22:13:19,674 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2888396.0, ans=0.1 2023-10-09 22:13:28,271 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.11 vs. limit=22.5 2023-10-09 22:14:02,599 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2888536.0, ans=0.0 2023-10-09 22:14:09,177 INFO [train.py:1031] (2/4) Epoch 14, batch 34250, loss[loss=0.2128, simple_loss=0.2655, pruned_loss=0.0577, ctc_loss=0.112, over 16730.00 frames. ], tot_loss[loss=0.2261, simple_loss=0.2813, pruned_loss=0.06319, ctc_loss=0.1115, over 3307549.09 frames. ], batch size: 308, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 22:14:15,120 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2888582.6666666665, ans=0.0 2023-10-09 22:14:15,724 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.386e+02 3.191e+02 3.616e+02 4.129e+02 7.013e+02, threshold=7.231e+02, percent-clipped=0.0 2023-10-09 22:14:17,668 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2888582.6666666665, ans=0.0 2023-10-09 22:14:19,103 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.03 vs. limit=10.0 2023-10-09 22:14:36,068 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=2888676.0, ans=0.125 2023-10-09 22:15:10,737 INFO [train.py:1031] (2/4) Epoch 14, batch 34300, loss[loss=0.2741, simple_loss=0.2995, pruned_loss=0.09132, ctc_loss=0.1652, over 16626.00 frames. ], tot_loss[loss=0.2262, simple_loss=0.2796, pruned_loss=0.06389, ctc_loss=0.1124, over 3310692.09 frames. ], batch size: 351, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 22:15:28,775 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.79 vs. limit=15.0 2023-10-09 22:15:31,793 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.09 vs. limit=15.0 2023-10-09 22:15:34,753 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.09 vs. limit=10.0 2023-10-09 22:15:43,855 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2888909.3333333335, ans=0.125 2023-10-09 22:15:46,910 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2888956.0, ans=0.125 2023-10-09 22:16:07,657 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2889002.6666666665, ans=0.0 2023-10-09 22:16:09,863 INFO [train.py:1031] (2/4) Epoch 14, batch 34350, loss[loss=0.2308, simple_loss=0.283, pruned_loss=0.06551, ctc_loss=0.1191, over 16965.00 frames. ], tot_loss[loss=0.2264, simple_loss=0.2785, pruned_loss=0.06455, ctc_loss=0.1131, over 3309469.92 frames. ], batch size: 309, lr: 2.53e-03, grad_scale: 2.0 2023-10-09 22:16:16,841 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.449e+02 3.283e+02 3.799e+02 4.453e+02 1.021e+03, threshold=7.599e+02, percent-clipped=4.0 2023-10-09 22:16:23,665 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2889096.0, ans=0.125 2023-10-09 22:16:41,411 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2889142.6666666665, ans=0.1 2023-10-09 22:16:50,081 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 22:16:57,357 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2889236.0, ans=0.1 2023-10-09 22:17:10,487 INFO [train.py:1031] (2/4) Epoch 14, batch 34400, loss[loss=0.2186, simple_loss=0.2784, pruned_loss=0.05923, ctc_loss=0.1005, over 16925.00 frames. ], tot_loss[loss=0.2264, simple_loss=0.2782, pruned_loss=0.06466, ctc_loss=0.113, over 3320662.90 frames. ], batch size: 90, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 22:17:26,517 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2889329.3333333335, ans=0.125 2023-10-09 22:18:06,414 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2889469.3333333335, ans=0.0 2023-10-09 22:18:11,086 INFO [train.py:1031] (2/4) Epoch 14, batch 34450, loss[loss=0.254, simple_loss=0.2875, pruned_loss=0.08139, ctc_loss=0.1445, over 16621.00 frames. ], tot_loss[loss=0.2276, simple_loss=0.278, pruned_loss=0.06561, ctc_loss=0.1147, over 3313404.49 frames. ], batch size: 416, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 22:18:19,265 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.547e+02 3.186e+02 3.591e+02 4.331e+02 7.838e+02, threshold=7.182e+02, percent-clipped=2.0 2023-10-09 22:18:40,407 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2889609.3333333335, ans=0.1 2023-10-09 22:19:07,846 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2023-10-09 22:19:14,166 INFO [train.py:1031] (2/4) Epoch 14, batch 34500, loss[loss=0.2031, simple_loss=0.2409, pruned_loss=0.06276, ctc_loss=0.0992, over 16591.00 frames. ], tot_loss[loss=0.233, simple_loss=0.2844, pruned_loss=0.06721, ctc_loss=0.1177, over 3310186.07 frames. ], batch size: 70, lr: 2.53e-03, grad_scale: 4.0 2023-10-09 22:19:14,443 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=2889749.3333333335, ans=0.0 2023-10-09 22:19:27,070 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.24 vs. limit=22.5 2023-10-09 22:19:31,605 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2889796.0, ans=0.0 2023-10-09 22:19:33,375 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2889796.0, ans=0.0 2023-10-09 22:19:57,268 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.55 vs. limit=22.5 2023-10-09 22:19:58,476 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.46 vs. limit=10.0 2023-10-09 22:19:59,516 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2889889.3333333335, ans=0.125 2023-10-09 22:20:06,443 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.70 vs. limit=15.0 2023-10-09 22:20:07,298 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2889936.0, ans=0.125 2023-10-09 22:20:10,189 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2889936.0, ans=0.1 2023-10-09 22:20:16,407 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2889936.0, ans=0.125 2023-10-09 22:20:20,485 INFO [train.py:1031] (2/4) Epoch 14, batch 34550, loss[loss=0.3223, simple_loss=0.3601, pruned_loss=0.1043, ctc_loss=0.1901, over 16659.00 frames. ], tot_loss[loss=0.2348, simple_loss=0.2917, pruned_loss=0.06572, ctc_loss=0.1162, over 3308289.22 frames. ], batch size: 351, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:20:30,360 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.650e+02 3.661e+02 4.529e+02 6.004e+02 9.470e+02, threshold=9.059e+02, percent-clipped=10.0 2023-10-09 22:21:12,333 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2890169.3333333335, ans=0.125 2023-10-09 22:21:20,696 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=2890169.3333333335, ans=0.0 2023-10-09 22:21:22,646 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=2890216.0, ans=0.125 2023-10-09 22:21:24,118 INFO [train.py:1031] (2/4) Epoch 14, batch 34600, loss[loss=0.1733, simple_loss=0.2281, pruned_loss=0.04454, ctc_loss=0.07343, over 16730.00 frames. ], tot_loss[loss=0.2315, simple_loss=0.2899, pruned_loss=0.06389, ctc_loss=0.1132, over 3298945.09 frames. ], batch size: 102, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:21:37,065 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=2890262.6666666665, ans=0.125 2023-10-09 22:21:42,978 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2890262.6666666665, ans=0.1 2023-10-09 22:21:56,069 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2890309.3333333335, ans=0.125 2023-10-09 22:22:07,944 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2890356.0, ans=0.125 2023-10-09 22:22:13,766 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=2890402.6666666665, ans=0.2 2023-10-09 22:22:25,917 INFO [train.py:1031] (2/4) Epoch 14, batch 34650, loss[loss=0.2212, simple_loss=0.2787, pruned_loss=0.05969, ctc_loss=0.1109, over 16963.00 frames. ], tot_loss[loss=0.2255, simple_loss=0.2848, pruned_loss=0.06129, ctc_loss=0.1091, over 3303389.15 frames. ], batch size: 243, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:22:26,320 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2890449.3333333335, ans=0.0 2023-10-09 22:22:30,014 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=2890449.3333333335, ans=0.0 2023-10-09 22:22:37,036 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.123e+02 2.928e+02 3.445e+02 4.113e+02 6.666e+02, threshold=6.890e+02, percent-clipped=0.0 2023-10-09 22:22:41,488 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.54 vs. limit=22.5 2023-10-09 22:23:16,273 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2890636.0, ans=0.125 2023-10-09 22:23:27,777 INFO [train.py:1031] (2/4) Epoch 14, batch 34700, loss[loss=0.2565, simple_loss=0.2984, pruned_loss=0.07974, ctc_loss=0.1376, over 16823.00 frames. ], tot_loss[loss=0.2267, simple_loss=0.2838, pruned_loss=0.06262, ctc_loss=0.1111, over 3308229.02 frames. ], batch size: 121, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:23:29,831 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2890682.6666666665, ans=0.0 2023-10-09 22:23:31,164 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.06 vs. limit=6.0 2023-10-09 22:23:47,036 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2890729.3333333335, ans=0.125 2023-10-09 22:23:47,085 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2890729.3333333335, ans=0.125 2023-10-09 22:23:49,981 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2890729.3333333335, ans=0.125 2023-10-09 22:24:31,578 INFO [train.py:1031] (2/4) Epoch 14, batch 34750, loss[loss=0.237, simple_loss=0.2914, pruned_loss=0.06816, ctc_loss=0.1157, over 16940.00 frames. ], tot_loss[loss=0.2329, simple_loss=0.2874, pruned_loss=0.06599, ctc_loss=0.1163, over 3314946.82 frames. ], batch size: 229, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:24:38,674 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2890916.0, ans=0.125 2023-10-09 22:24:42,691 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.997e+02 3.549e+02 4.003e+02 4.772e+02 8.039e+02, threshold=8.005e+02, percent-clipped=2.0 2023-10-09 22:25:14,925 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.96 vs. limit=15.0 2023-10-09 22:25:21,600 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2891102.6666666665, ans=0.1 2023-10-09 22:25:31,229 INFO [train.py:1031] (2/4) Epoch 14, batch 34800, loss[loss=0.2429, simple_loss=0.2922, pruned_loss=0.06957, ctc_loss=0.1361, over 16855.00 frames. ], tot_loss[loss=0.2346, simple_loss=0.2872, pruned_loss=0.06732, ctc_loss=0.1183, over 3316464.70 frames. ], batch size: 328, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:25:57,936 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=2891242.6666666665, ans=0.07 2023-10-09 22:26:00,925 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.97 vs. limit=10.0 2023-10-09 22:26:33,343 INFO [train.py:1031] (2/4) Epoch 14, batch 34850, loss[loss=0.2145, simple_loss=0.2323, pruned_loss=0.07077, ctc_loss=0.1376, over 15318.00 frames. ], tot_loss[loss=0.2325, simple_loss=0.284, pruned_loss=0.06703, ctc_loss=0.1176, over 3315956.46 frames. ], batch size: 526, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:26:46,831 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.513e+02 3.209e+02 3.596e+02 4.244e+02 8.793e+02, threshold=7.192e+02, percent-clipped=1.0 2023-10-09 22:26:51,917 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.44 vs. limit=15.0 2023-10-09 22:27:01,508 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2891476.0, ans=0.0 2023-10-09 22:27:22,149 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2891569.3333333335, ans=0.05 2023-10-09 22:27:23,637 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=2891569.3333333335, ans=0.2 2023-10-09 22:27:35,831 INFO [train.py:1031] (2/4) Epoch 14, batch 34900, loss[loss=0.215, simple_loss=0.2798, pruned_loss=0.05565, ctc_loss=0.09729, over 16637.00 frames. ], tot_loss[loss=0.228, simple_loss=0.2784, pruned_loss=0.06572, ctc_loss=0.1152, over 3321267.03 frames. ], batch size: 151, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:27:42,606 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2891616.0, ans=0.1 2023-10-09 22:27:45,270 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2891616.0, ans=10.0 2023-10-09 22:27:48,982 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2891662.6666666665, ans=0.0 2023-10-09 22:27:51,400 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.68 vs. limit=15.0 2023-10-09 22:27:53,177 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2891662.6666666665, ans=0.125 2023-10-09 22:27:54,241 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=2891662.6666666665, ans=0.125 2023-10-09 22:28:06,390 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.15 vs. limit=15.0 2023-10-09 22:28:38,937 INFO [train.py:1031] (2/4) Epoch 14, batch 34950, loss[loss=0.2465, simple_loss=0.2923, pruned_loss=0.07437, ctc_loss=0.1297, over 16228.00 frames. ], tot_loss[loss=0.2288, simple_loss=0.2801, pruned_loss=0.06573, ctc_loss=0.115, over 3303264.14 frames. ], batch size: 463, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:28:39,388 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2891849.3333333335, ans=0.125 2023-10-09 22:28:49,450 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2891849.3333333335, ans=0.05 2023-10-09 22:28:54,409 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.302e+02 3.348e+02 3.779e+02 4.801e+02 1.162e+03, threshold=7.559e+02, percent-clipped=3.0 2023-10-09 22:28:57,561 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2891896.0, ans=0.1 2023-10-09 22:29:13,161 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.92 vs. limit=6.0 2023-10-09 22:29:22,988 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.34 vs. limit=15.0 2023-10-09 22:29:42,587 INFO [train.py:1031] (2/4) Epoch 14, batch 35000, loss[loss=0.2027, simple_loss=0.282, pruned_loss=0.044, ctc_loss=0.08855, over 15179.00 frames. ], tot_loss[loss=0.2291, simple_loss=0.2816, pruned_loss=0.06537, ctc_loss=0.1147, over 3305926.24 frames. ], batch size: 526, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:29:54,859 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2892129.3333333335, ans=0.125 2023-10-09 22:30:22,470 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2892222.6666666665, ans=0.0 2023-10-09 22:30:25,014 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.83 vs. limit=15.0 2023-10-09 22:30:44,817 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2023-10-09 22:30:48,023 INFO [train.py:1031] (2/4) Epoch 14, batch 35050, loss[loss=0.2356, simple_loss=0.2854, pruned_loss=0.06984, ctc_loss=0.1153, over 16841.00 frames. ], tot_loss[loss=0.2283, simple_loss=0.2828, pruned_loss=0.0643, ctc_loss=0.1133, over 3307627.10 frames. ], batch size: 121, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:30:57,229 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.33 vs. limit=15.0 2023-10-09 22:31:00,793 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=2892362.6666666665, ans=0.125 2023-10-09 22:31:04,384 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.328e+02 3.170e+02 3.753e+02 4.510e+02 9.970e+02, threshold=7.506e+02, percent-clipped=2.0 2023-10-09 22:31:06,407 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2892362.6666666665, ans=0.1 2023-10-09 22:31:15,650 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.02 vs. limit=15.0 2023-10-09 22:31:21,159 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.14 vs. limit=15.0 2023-10-09 22:31:24,633 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2892409.3333333335, ans=0.1 2023-10-09 22:31:26,733 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2892456.0, ans=0.125 2023-10-09 22:31:36,844 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=2892456.0, ans=0.07 2023-10-09 22:31:41,812 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2892502.6666666665, ans=0.0 2023-10-09 22:31:51,705 INFO [train.py:1031] (2/4) Epoch 14, batch 35100, loss[loss=0.1971, simple_loss=0.2574, pruned_loss=0.0511, ctc_loss=0.08668, over 16742.00 frames. ], tot_loss[loss=0.2295, simple_loss=0.2854, pruned_loss=0.06412, ctc_loss=0.1135, over 3300841.89 frames. ], batch size: 130, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:32:00,086 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2892549.3333333335, ans=0.0 2023-10-09 22:32:08,536 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2892596.0, ans=0.0 2023-10-09 22:32:14,694 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.37 vs. limit=15.0 2023-10-09 22:32:19,021 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 22:32:31,526 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=2892689.3333333335, ans=0.125 2023-10-09 22:32:34,331 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=2892689.3333333335, ans=0.0 2023-10-09 22:32:49,002 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.65 vs. limit=15.0 2023-10-09 22:32:54,766 INFO [train.py:1031] (2/4) Epoch 14, batch 35150, loss[loss=0.2474, simple_loss=0.3045, pruned_loss=0.06978, ctc_loss=0.1267, over 16839.00 frames. ], tot_loss[loss=0.231, simple_loss=0.2868, pruned_loss=0.06474, ctc_loss=0.1144, over 3287641.68 frames. ], batch size: 309, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:33:12,901 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.596e+02 3.271e+02 3.877e+02 4.489e+02 9.044e+02, threshold=7.754e+02, percent-clipped=1.0 2023-10-09 22:33:21,553 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2892876.0, ans=0.125 2023-10-09 22:33:51,727 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2892969.3333333335, ans=0.0 2023-10-09 22:33:52,773 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2892969.3333333335, ans=0.0 2023-10-09 22:33:56,335 INFO [train.py:1031] (2/4) Epoch 14, batch 35200, loss[loss=0.2102, simple_loss=0.2856, pruned_loss=0.0503, ctc_loss=0.08571, over 16761.00 frames. ], tot_loss[loss=0.2281, simple_loss=0.2867, pruned_loss=0.06254, ctc_loss=0.111, over 3293198.02 frames. ], batch size: 272, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:34:04,543 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.33 vs. limit=22.5 2023-10-09 22:34:59,204 INFO [train.py:1031] (2/4) Epoch 14, batch 35250, loss[loss=0.3075, simple_loss=0.3515, pruned_loss=0.09637, ctc_loss=0.1769, over 16547.00 frames. ], tot_loss[loss=0.2247, simple_loss=0.2844, pruned_loss=0.0609, ctc_loss=0.1082, over 3292918.59 frames. ], batch size: 351, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:35:13,149 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2893296.0, ans=0.125 2023-10-09 22:35:19,503 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.152e+02 2.987e+02 3.599e+02 4.398e+02 6.579e+02, threshold=7.198e+02, percent-clipped=0.0 2023-10-09 22:35:37,143 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2893342.6666666665, ans=0.125 2023-10-09 22:35:46,271 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=2893389.3333333335, ans=0.2 2023-10-09 22:35:57,467 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2893436.0, ans=0.0 2023-10-09 22:36:01,287 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2893436.0, ans=0.125 2023-10-09 22:36:05,979 INFO [train.py:1031] (2/4) Epoch 14, batch 35300, loss[loss=0.2499, simple_loss=0.3215, pruned_loss=0.06525, ctc_loss=0.1195, over 16776.00 frames. ], tot_loss[loss=0.2299, simple_loss=0.292, pruned_loss=0.06192, ctc_loss=0.1101, over 3286799.13 frames. ], batch size: 272, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:36:06,589 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.55 vs. limit=10.0 2023-10-09 22:36:29,845 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2893529.3333333335, ans=0.1 2023-10-09 22:36:36,929 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2893576.0, ans=0.1 2023-10-09 22:36:45,175 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2893622.6666666665, ans=0.0 2023-10-09 22:37:01,830 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2893669.3333333335, ans=0.125 2023-10-09 22:37:10,998 INFO [train.py:1031] (2/4) Epoch 14, batch 35350, loss[loss=0.2601, simple_loss=0.3137, pruned_loss=0.07737, ctc_loss=0.1292, over 16852.00 frames. ], tot_loss[loss=0.2364, simple_loss=0.2965, pruned_loss=0.06507, ctc_loss=0.1152, over 3291708.58 frames. ], batch size: 102, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:37:31,482 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.635e+02 3.418e+02 3.862e+02 4.842e+02 9.244e+02, threshold=7.725e+02, percent-clipped=2.0 2023-10-09 22:37:33,185 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.98 vs. limit=12.0 2023-10-09 22:37:34,486 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2893762.6666666665, ans=0.0 2023-10-09 22:37:50,530 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2893856.0, ans=0.125 2023-10-09 22:38:10,708 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2893902.6666666665, ans=0.1 2023-10-09 22:38:13,816 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.27 vs. limit=22.5 2023-10-09 22:38:14,134 INFO [train.py:1031] (2/4) Epoch 14, batch 35400, loss[loss=0.2173, simple_loss=0.2708, pruned_loss=0.06135, ctc_loss=0.1028, over 16662.00 frames. ], tot_loss[loss=0.2399, simple_loss=0.3014, pruned_loss=0.06587, ctc_loss=0.1166, over 3298976.47 frames. ], batch size: 140, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:38:26,640 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2893996.0, ans=0.1 2023-10-09 22:38:37,332 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2894042.6666666665, ans=0.1 2023-10-09 22:39:11,108 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2894136.0, ans=0.125 2023-10-09 22:39:14,645 INFO [train.py:1031] (2/4) Epoch 14, batch 35450, loss[loss=0.1938, simple_loss=0.2377, pruned_loss=0.05548, ctc_loss=0.09725, over 16815.00 frames. ], tot_loss[loss=0.2358, simple_loss=0.2944, pruned_loss=0.06544, ctc_loss=0.1155, over 3303130.05 frames. ], batch size: 176, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:39:19,888 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.75 vs. limit=15.0 2023-10-09 22:39:21,511 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=2894182.6666666665, ans=0.0 2023-10-09 22:39:22,735 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=2894182.6666666665, ans=0.0 2023-10-09 22:39:24,393 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=2894182.6666666665, ans=0.125 2023-10-09 22:39:30,340 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=2894229.3333333335, ans=10.0 2023-10-09 22:39:33,026 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=2894229.3333333335, ans=0.0 2023-10-09 22:39:36,533 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.322e+02 3.230e+02 3.810e+02 4.860e+02 8.869e+02, threshold=7.620e+02, percent-clipped=1.0 2023-10-09 22:39:41,124 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2894276.0, ans=0.1 2023-10-09 22:39:43,259 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2894276.0, ans=0.04949747468305833 2023-10-09 22:39:51,589 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2894276.0, ans=0.0 2023-10-09 22:40:17,601 INFO [train.py:1031] (2/4) Epoch 14, batch 35500, loss[loss=0.2728, simple_loss=0.3153, pruned_loss=0.08501, ctc_loss=0.1506, over 16844.00 frames. ], tot_loss[loss=0.2355, simple_loss=0.2913, pruned_loss=0.06651, ctc_loss=0.1169, over 3296286.13 frames. ], batch size: 310, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:40:46,840 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2894509.3333333335, ans=0.025 2023-10-09 22:40:52,327 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.90 vs. limit=15.0 2023-10-09 22:41:20,534 INFO [train.py:1031] (2/4) Epoch 14, batch 35550, loss[loss=0.2905, simple_loss=0.3232, pruned_loss=0.09581, ctc_loss=0.1657, over 16609.00 frames. ], tot_loss[loss=0.2409, simple_loss=0.2943, pruned_loss=0.0694, ctc_loss=0.1214, over 3290993.16 frames. ], batch size: 418, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:41:34,422 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2894696.0, ans=0.125 2023-10-09 22:41:42,147 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.738e+02 3.683e+02 4.220e+02 5.051e+02 8.035e+02, threshold=8.441e+02, percent-clipped=1.0 2023-10-09 22:42:14,269 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2894836.0, ans=0.2 2023-10-09 22:42:16,427 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2894836.0, ans=0.0 2023-10-09 22:42:17,492 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2894836.0, ans=0.125 2023-10-09 22:42:22,017 INFO [train.py:1031] (2/4) Epoch 14, batch 35600, loss[loss=0.2077, simple_loss=0.2846, pruned_loss=0.04679, ctc_loss=0.09297, over 16894.00 frames. ], tot_loss[loss=0.2444, simple_loss=0.2973, pruned_loss=0.07092, ctc_loss=0.1242, over 3293499.13 frames. ], batch size: 215, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:42:35,205 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2894929.3333333335, ans=0.125 2023-10-09 22:43:07,509 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 22:43:23,145 INFO [train.py:1031] (2/4) Epoch 14, batch 35650, loss[loss=0.1916, simple_loss=0.2676, pruned_loss=0.04196, ctc_loss=0.07923, over 16903.00 frames. ], tot_loss[loss=0.2367, simple_loss=0.2929, pruned_loss=0.06674, ctc_loss=0.1176, over 3290949.61 frames. ], batch size: 215, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:43:28,611 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 22:43:36,190 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2895162.6666666665, ans=0.125 2023-10-09 22:43:43,744 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=2895162.6666666665, ans=0.125 2023-10-09 22:43:47,003 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.008e+02 2.989e+02 3.692e+02 4.285e+02 1.206e+03, threshold=7.384e+02, percent-clipped=2.0 2023-10-09 22:43:55,836 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2895209.3333333335, ans=10.0 2023-10-09 22:44:10,536 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 22:44:26,165 INFO [train.py:1031] (2/4) Epoch 14, batch 35700, loss[loss=0.2759, simple_loss=0.3266, pruned_loss=0.08454, ctc_loss=0.1405, over 16893.00 frames. ], tot_loss[loss=0.2391, simple_loss=0.2957, pruned_loss=0.06746, ctc_loss=0.1189, over 3294727.90 frames. ], batch size: 258, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:44:29,334 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2895349.3333333335, ans=0.125 2023-10-09 22:45:27,083 INFO [train.py:1031] (2/4) Epoch 14, batch 35750, loss[loss=0.2247, simple_loss=0.2724, pruned_loss=0.06649, ctc_loss=0.1102, over 16779.00 frames. ], tot_loss[loss=0.2389, simple_loss=0.2938, pruned_loss=0.0681, ctc_loss=0.1197, over 3298103.38 frames. ], batch size: 140, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:45:27,481 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2895582.6666666665, ans=0.125 2023-10-09 22:45:42,425 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.71 vs. limit=15.0 2023-10-09 22:45:53,024 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.484e+02 3.764e+02 4.390e+02 5.354e+02 1.212e+03, threshold=8.781e+02, percent-clipped=8.0 2023-10-09 22:45:54,066 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2895676.0, ans=0.1 2023-10-09 22:46:18,904 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2895769.3333333335, ans=0.125 2023-10-09 22:46:19,882 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2895769.3333333335, ans=0.0 2023-10-09 22:46:29,522 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.44 vs. limit=15.0 2023-10-09 22:46:29,802 INFO [train.py:1031] (2/4) Epoch 14, batch 35800, loss[loss=0.253, simple_loss=0.3024, pruned_loss=0.07632, ctc_loss=0.1275, over 16752.00 frames. ], tot_loss[loss=0.2407, simple_loss=0.2936, pruned_loss=0.06952, ctc_loss=0.1216, over 3297624.42 frames. ], batch size: 188, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:46:37,135 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=2895816.0, ans=0.125 2023-10-09 22:47:03,568 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.52 vs. limit=15.0 2023-10-09 22:47:31,694 INFO [train.py:1031] (2/4) Epoch 14, batch 35850, loss[loss=0.2444, simple_loss=0.314, pruned_loss=0.06436, ctc_loss=0.1152, over 16957.00 frames. ], tot_loss[loss=0.2423, simple_loss=0.2964, pruned_loss=0.06977, ctc_loss=0.1217, over 3299790.56 frames. ], batch size: 258, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:47:33,160 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2896049.3333333335, ans=0.125 2023-10-09 22:47:43,026 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2896096.0, ans=0.0 2023-10-09 22:47:46,795 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2896096.0, ans=0.0 2023-10-09 22:47:55,946 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=2896142.6666666665, ans=0.04949747468305833 2023-10-09 22:47:57,702 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.003e+02 3.376e+02 4.105e+02 5.188e+02 8.758e+02, threshold=8.210e+02, percent-clipped=0.0 2023-10-09 22:48:17,221 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2896189.3333333335, ans=0.1 2023-10-09 22:48:27,410 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2896236.0, ans=0.1 2023-10-09 22:48:32,283 INFO [train.py:1031] (2/4) Epoch 14, batch 35900, loss[loss=0.1295, simple_loss=0.1752, pruned_loss=0.03113, ctc_loss=0.05376, over 16766.00 frames. ], tot_loss[loss=0.2317, simple_loss=0.293, pruned_loss=0.06302, ctc_loss=0.1108, over 3299959.06 frames. ], batch size: 70, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:48:40,802 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2896282.6666666665, ans=0.2 2023-10-09 22:48:51,281 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 22:49:21,860 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.24 vs. limit=15.0 2023-10-09 22:49:34,427 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2896469.3333333335, ans=0.125 2023-10-09 22:49:36,317 INFO [train.py:1031] (2/4) Epoch 14, batch 35950, loss[loss=0.1732, simple_loss=0.2474, pruned_loss=0.03657, ctc_loss=0.06455, over 16854.00 frames. ], tot_loss[loss=0.2263, simple_loss=0.2913, pruned_loss=0.05956, ctc_loss=0.1055, over 3295093.28 frames. ], batch size: 202, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:49:43,239 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2896516.0, ans=0.1 2023-10-09 22:49:56,589 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=2896562.6666666665, ans=0.125 2023-10-09 22:50:04,034 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.999e+02 2.666e+02 3.384e+02 4.357e+02 7.839e+02, threshold=6.768e+02, percent-clipped=0.0 2023-10-09 22:50:12,849 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.48 vs. limit=10.0 2023-10-09 22:50:15,162 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=2896656.0, ans=0.0 2023-10-09 22:50:15,213 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2896656.0, ans=0.1 2023-10-09 22:50:25,914 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2896702.6666666665, ans=0.125 2023-10-09 22:50:38,128 INFO [train.py:1031] (2/4) Epoch 14, batch 36000, loss[loss=0.2078, simple_loss=0.2691, pruned_loss=0.05502, ctc_loss=0.0909, over 16917.00 frames. ], tot_loss[loss=0.2156, simple_loss=0.2823, pruned_loss=0.05497, ctc_loss=0.09749, over 3302476.68 frames. ], batch size: 82, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:50:38,129 INFO [train.py:1054] (2/4) Computing validation loss 2023-10-09 22:50:50,189 INFO [zipformer.py:1853] (2/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.5834, 2.5344, 3.0983, 3.4163, 3.6433, 3.1832, 2.8904, 2.6183], device='cuda:2') 2023-10-09 22:50:58,874 INFO [train.py:1063] (2/4) Epoch 14, validation: loss=0.2335, simple_loss=0.304, pruned_loss=0.06295, ctc_loss=0.09275, over 1796401.00 frames. 2023-10-09 22:50:58,874 INFO [train.py:1064] (2/4) Maximum memory allocated so far is 14591MB 2023-10-09 22:51:02,059 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.63 vs. limit=22.5 2023-10-09 22:51:21,545 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.09 vs. limit=15.0 2023-10-09 22:51:59,924 INFO [train.py:1031] (2/4) Epoch 14, batch 36050, loss[loss=0.2224, simple_loss=0.2527, pruned_loss=0.07034, ctc_loss=0.1289, over 15462.00 frames. ], tot_loss[loss=0.216, simple_loss=0.2799, pruned_loss=0.05618, ctc_loss=0.0994, over 3300820.71 frames. ], batch size: 526, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:52:03,403 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.12 vs. limit=15.0 2023-10-09 22:52:08,433 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2896982.6666666665, ans=0.125 2023-10-09 22:52:20,054 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2897029.3333333335, ans=0.2 2023-10-09 22:52:29,190 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+02 2.810e+02 3.555e+02 4.396e+02 7.920e+02, threshold=7.110e+02, percent-clipped=1.0 2023-10-09 22:52:48,231 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2897122.6666666665, ans=0.1 2023-10-09 22:52:54,245 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.56 vs. limit=15.0 2023-10-09 22:52:56,829 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2897169.3333333335, ans=0.125 2023-10-09 22:53:02,992 INFO [train.py:1031] (2/4) Epoch 14, batch 36100, loss[loss=0.2257, simple_loss=0.2765, pruned_loss=0.0647, ctc_loss=0.1136, over 16897.00 frames. ], tot_loss[loss=0.2226, simple_loss=0.2832, pruned_loss=0.05984, ctc_loss=0.1055, over 3303171.02 frames. ], batch size: 228, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:53:11,024 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=2897216.0, ans=0.125 2023-10-09 22:53:23,494 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2897262.6666666665, ans=0.125 2023-10-09 22:53:26,719 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2897262.6666666665, ans=0.0 2023-10-09 22:53:35,255 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2897309.3333333335, ans=0.125 2023-10-09 22:53:36,267 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2897309.3333333335, ans=0.125 2023-10-09 22:53:47,714 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2897356.0, ans=0.1 2023-10-09 22:54:06,392 INFO [train.py:1031] (2/4) Epoch 14, batch 36150, loss[loss=0.219, simple_loss=0.2703, pruned_loss=0.0622, ctc_loss=0.1085, over 16833.00 frames. ], tot_loss[loss=0.2254, simple_loss=0.2848, pruned_loss=0.06144, ctc_loss=0.108, over 3311294.69 frames. ], batch size: 164, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 22:54:11,592 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2897449.3333333335, ans=0.125 2023-10-09 22:54:28,705 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.82 vs. limit=15.0 2023-10-09 22:54:36,549 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.766e+02 3.412e+02 4.167e+02 5.128e+02 1.236e+03, threshold=8.334e+02, percent-clipped=3.0 2023-10-09 22:55:00,814 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=2897636.0, ans=0.0 2023-10-09 22:55:08,376 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.98 vs. limit=10.0 2023-10-09 22:55:09,644 INFO [train.py:1031] (2/4) Epoch 14, batch 36200, loss[loss=0.2124, simple_loss=0.2737, pruned_loss=0.05606, ctc_loss=0.09752, over 16707.00 frames. ], tot_loss[loss=0.2295, simple_loss=0.287, pruned_loss=0.06357, ctc_loss=0.1121, over 3305280.05 frames. ], batch size: 140, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:55:12,836 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2897682.6666666665, ans=0.125 2023-10-09 22:55:21,879 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.22 vs. limit=15.0 2023-10-09 22:55:24,138 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.54 vs. limit=22.5 2023-10-09 22:55:28,629 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.88 vs. limit=10.0 2023-10-09 22:55:30,420 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2897729.3333333335, ans=0.125 2023-10-09 22:55:39,001 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2897776.0, ans=0.2 2023-10-09 22:56:11,655 INFO [train.py:1031] (2/4) Epoch 14, batch 36250, loss[loss=0.2496, simple_loss=0.3179, pruned_loss=0.06415, ctc_loss=0.1324, over 16865.00 frames. ], tot_loss[loss=0.2331, simple_loss=0.2925, pruned_loss=0.06401, ctc_loss=0.1145, over 3309950.69 frames. ], batch size: 292, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:56:28,796 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2897962.6666666665, ans=0.05 2023-10-09 22:56:42,283 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.544e+02 3.450e+02 4.069e+02 4.879e+02 1.069e+03, threshold=8.138e+02, percent-clipped=4.0 2023-10-09 22:56:47,694 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2898056.0, ans=0.125 2023-10-09 22:57:13,561 INFO [train.py:1031] (2/4) Epoch 14, batch 36300, loss[loss=0.2146, simple_loss=0.2808, pruned_loss=0.05503, ctc_loss=0.09568, over 16320.00 frames. ], tot_loss[loss=0.2318, simple_loss=0.2903, pruned_loss=0.06384, ctc_loss=0.1138, over 3312875.00 frames. ], batch size: 463, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:57:20,756 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.27 vs. limit=15.0 2023-10-09 22:57:49,190 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.22 vs. limit=15.0 2023-10-09 22:58:05,684 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2898336.0, ans=0.0 2023-10-09 22:58:08,894 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2898336.0, ans=0.2 2023-10-09 22:58:16,203 INFO [train.py:1031] (2/4) Epoch 14, batch 36350, loss[loss=0.2664, simple_loss=0.3152, pruned_loss=0.08091, ctc_loss=0.1395, over 16714.00 frames. ], tot_loss[loss=0.2345, simple_loss=0.2919, pruned_loss=0.06539, ctc_loss=0.1159, over 3302357.20 frames. ], batch size: 272, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:58:16,587 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2898382.6666666665, ans=0.2 2023-10-09 22:58:28,640 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=2898429.3333333335, ans=0.0 2023-10-09 22:58:37,759 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.98 vs. limit=6.0 2023-10-09 22:58:48,784 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.608e+02 3.450e+02 4.170e+02 4.968e+02 1.204e+03, threshold=8.340e+02, percent-clipped=3.0 2023-10-09 22:58:50,794 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2898476.0, ans=0.125 2023-10-09 22:58:55,600 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2898522.6666666665, ans=0.0 2023-10-09 22:58:57,221 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.30 vs. limit=15.0 2023-10-09 22:59:19,337 INFO [train.py:1031] (2/4) Epoch 14, batch 36400, loss[loss=0.2122, simple_loss=0.2593, pruned_loss=0.0615, ctc_loss=0.1053, over 16763.00 frames. ], tot_loss[loss=0.2362, simple_loss=0.2917, pruned_loss=0.06674, ctc_loss=0.1177, over 3301668.02 frames. ], batch size: 242, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 22:59:25,569 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.78 vs. limit=15.0 2023-10-09 22:59:41,570 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2898662.6666666665, ans=0.1 2023-10-09 22:59:51,272 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2898709.3333333335, ans=0.125 2023-10-09 23:00:02,486 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2898756.0, ans=0.1 2023-10-09 23:00:09,377 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2898802.6666666665, ans=0.0 2023-10-09 23:00:21,499 INFO [train.py:1031] (2/4) Epoch 14, batch 36450, loss[loss=0.2059, simple_loss=0.2567, pruned_loss=0.05734, ctc_loss=0.1013, over 16809.00 frames. ], tot_loss[loss=0.2313, simple_loss=0.2847, pruned_loss=0.06578, ctc_loss=0.1157, over 3301760.27 frames. ], batch size: 202, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:00:22,928 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2898849.3333333335, ans=0.0 2023-10-09 23:00:23,893 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2898849.3333333335, ans=0.125 2023-10-09 23:00:43,067 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2898896.0, ans=0.0 2023-10-09 23:00:50,202 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=2898942.6666666665, ans=0.0 2023-10-09 23:00:54,968 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.330e+02 3.085e+02 3.494e+02 4.091e+02 1.458e+03, threshold=6.988e+02, percent-clipped=1.0 2023-10-09 23:00:55,399 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2898942.6666666665, ans=0.125 2023-10-09 23:01:06,079 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2898989.3333333335, ans=0.125 2023-10-09 23:01:13,127 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2899036.0, ans=0.125 2023-10-09 23:01:17,418 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=2899036.0, ans=0.04949747468305833 2023-10-09 23:01:17,957 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.14 vs. limit=22.5 2023-10-09 23:01:24,215 INFO [train.py:1031] (2/4) Epoch 14, batch 36500, loss[loss=0.193, simple_loss=0.2365, pruned_loss=0.05541, ctc_loss=0.0968, over 16860.00 frames. ], tot_loss[loss=0.2267, simple_loss=0.2781, pruned_loss=0.0648, ctc_loss=0.114, over 3308897.50 frames. ], batch size: 203, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:01:29,415 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2899082.6666666665, ans=0.125 2023-10-09 23:01:43,240 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2899129.3333333335, ans=0.125 2023-10-09 23:01:53,668 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=2899176.0, ans=0.125 2023-10-09 23:02:13,364 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn2.whiten.whitening_limit, batch_count=2899222.6666666665, ans=22.5 2023-10-09 23:02:17,535 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.92 vs. limit=15.0 2023-10-09 23:02:25,279 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2899269.3333333335, ans=0.0 2023-10-09 23:02:27,713 INFO [train.py:1031] (2/4) Epoch 14, batch 36550, loss[loss=0.2474, simple_loss=0.3141, pruned_loss=0.06608, ctc_loss=0.1212, over 16574.00 frames. ], tot_loss[loss=0.2251, simple_loss=0.2774, pruned_loss=0.06388, ctc_loss=0.1125, over 3313604.85 frames. ], batch size: 351, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:02:42,983 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2899362.6666666665, ans=0.125 2023-10-09 23:03:01,170 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.446e+02 3.250e+02 3.665e+02 4.225e+02 1.129e+03, threshold=7.330e+02, percent-clipped=1.0 2023-10-09 23:03:04,428 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2899456.0, ans=0.125 2023-10-09 23:03:06,688 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.28 vs. limit=22.5 2023-10-09 23:03:28,787 INFO [train.py:1031] (2/4) Epoch 14, batch 36600, loss[loss=0.1487, simple_loss=0.2168, pruned_loss=0.02949, ctc_loss=0.05413, over 10887.00 frames. ], tot_loss[loss=0.2225, simple_loss=0.2755, pruned_loss=0.0626, ctc_loss=0.1108, over 3308437.12 frames. ], batch size: 37, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:03:31,201 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2899549.3333333335, ans=0.125 2023-10-09 23:03:35,086 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2899549.3333333335, ans=0.07 2023-10-09 23:03:48,371 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2899596.0, ans=0.1 2023-10-09 23:03:48,433 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2899596.0, ans=0.025 2023-10-09 23:04:14,061 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=2899689.3333333335, ans=0.2 2023-10-09 23:04:16,636 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.75 vs. limit=15.0 2023-10-09 23:04:30,819 INFO [train.py:1031] (2/4) Epoch 14, batch 36650, loss[loss=0.2266, simple_loss=0.2799, pruned_loss=0.06489, ctc_loss=0.1088, over 16916.00 frames. ], tot_loss[loss=0.2167, simple_loss=0.2695, pruned_loss=0.0605, ctc_loss=0.1073, over 3309083.13 frames. ], batch size: 78, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:04:41,367 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=2899782.6666666665, ans=0.125 2023-10-09 23:04:44,332 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2899829.3333333335, ans=0.1 2023-10-09 23:04:52,985 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2899829.3333333335, ans=0.2 2023-10-09 23:04:59,506 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2899876.0, ans=0.0 2023-10-09 23:05:06,012 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.119e+02 3.036e+02 3.411e+02 4.060e+02 1.638e+03, threshold=6.823e+02, percent-clipped=3.0 2023-10-09 23:05:33,283 INFO [train.py:1031] (2/4) Epoch 14, batch 36700, loss[loss=0.2151, simple_loss=0.2498, pruned_loss=0.06567, ctc_loss=0.1225, over 16446.00 frames. ], tot_loss[loss=0.2139, simple_loss=0.2653, pruned_loss=0.06009, ctc_loss=0.1061, over 3310709.34 frames. ], batch size: 416, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:05:53,690 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2900062.6666666665, ans=0.125 2023-10-09 23:05:56,348 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=2900109.3333333335, ans=0.09899494936611666 2023-10-09 23:06:13,281 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=2900156.0, ans=0.2 2023-10-09 23:06:34,442 INFO [train.py:1031] (2/4) Epoch 14, batch 36750, loss[loss=0.2538, simple_loss=0.2951, pruned_loss=0.07836, ctc_loss=0.1393, over 16955.00 frames. ], tot_loss[loss=0.2172, simple_loss=0.2669, pruned_loss=0.06194, ctc_loss=0.1089, over 3315747.73 frames. ], batch size: 309, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:06:37,876 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2900249.3333333335, ans=0.0 2023-10-09 23:06:37,911 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2900249.3333333335, ans=0.125 2023-10-09 23:06:55,632 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=2900296.0, ans=0.0 2023-10-09 23:07:09,709 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.459e+02 3.153e+02 3.511e+02 4.063e+02 5.415e+02, threshold=7.022e+02, percent-clipped=0.0 2023-10-09 23:07:30,311 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.01 vs. limit=12.0 2023-10-09 23:07:31,269 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.65 vs. limit=15.0 2023-10-09 23:07:34,281 INFO [train.py:1031] (2/4) Epoch 14, batch 36800, loss[loss=0.1709, simple_loss=0.2044, pruned_loss=0.05134, ctc_loss=0.08689, over 16952.00 frames. ], tot_loss[loss=0.219, simple_loss=0.2686, pruned_loss=0.06276, ctc_loss=0.1097, over 3315837.37 frames. ], batch size: 78, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:08:25,773 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=2900669.3333333335, ans=0.125 2023-10-09 23:08:27,249 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=2900669.3333333335, ans=0.125 2023-10-09 23:08:27,329 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=2900669.3333333335, ans=0.125 2023-10-09 23:08:35,593 INFO [train.py:1031] (2/4) Epoch 14, batch 36850, loss[loss=0.2424, simple_loss=0.287, pruned_loss=0.07634, ctc_loss=0.1126, over 11823.00 frames. ], tot_loss[loss=0.2182, simple_loss=0.2695, pruned_loss=0.06193, ctc_loss=0.1075, over 3306386.08 frames. ], batch size: 38, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:08:39,072 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2900716.0, ans=0.125 2023-10-09 23:09:10,632 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2900809.3333333335, ans=0.125 2023-10-09 23:09:12,544 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=2900856.0, ans=0.0 2023-10-09 23:09:16,103 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.524e+02 3.452e+02 4.218e+02 5.067e+02 9.154e+02, threshold=8.437e+02, percent-clipped=6.0 2023-10-09 23:09:38,264 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.50 vs. limit=15.0 2023-10-09 23:09:38,548 INFO [train.py:1031] (2/4) Epoch 14, batch 36900, loss[loss=0.236, simple_loss=0.2898, pruned_loss=0.06686, ctc_loss=0.1213, over 16200.00 frames. ], tot_loss[loss=0.2265, simple_loss=0.278, pruned_loss=0.06493, ctc_loss=0.1128, over 3311901.20 frames. ], batch size: 463, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:09:42,706 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2900949.3333333335, ans=0.125 2023-10-09 23:10:03,816 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2901042.6666666665, ans=0.125 2023-10-09 23:10:27,167 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=2901089.3333333335, ans=0.125 2023-10-09 23:10:33,220 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.02 vs. limit=12.0 2023-10-09 23:10:41,452 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2901136.0, ans=0.0 2023-10-09 23:10:43,336 INFO [train.py:1031] (2/4) Epoch 14, batch 36950, loss[loss=0.2605, simple_loss=0.2889, pruned_loss=0.08557, ctc_loss=0.1521, over 15304.00 frames. ], tot_loss[loss=0.2349, simple_loss=0.2855, pruned_loss=0.06836, ctc_loss=0.1186, over 3295798.42 frames. ], batch size: 527, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:10:43,660 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2901182.6666666665, ans=0.1 2023-10-09 23:10:45,408 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 23:10:55,884 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2901229.3333333335, ans=0.2 2023-10-09 23:11:04,001 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=2901229.3333333335, ans=0.125 2023-10-09 23:11:19,955 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.16 vs. limit=12.0 2023-10-09 23:11:25,045 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.810e+02 3.608e+02 4.056e+02 4.983e+02 1.030e+03, threshold=8.112e+02, percent-clipped=3.0 2023-10-09 23:11:29,363 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2901322.6666666665, ans=0.95 2023-10-09 23:11:35,106 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2901369.3333333335, ans=0.0 2023-10-09 23:11:35,412 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.25 vs. limit=12.0 2023-10-09 23:11:46,858 INFO [train.py:1031] (2/4) Epoch 14, batch 37000, loss[loss=0.2701, simple_loss=0.329, pruned_loss=0.08089, ctc_loss=0.1234, over 12964.00 frames. ], tot_loss[loss=0.2392, simple_loss=0.292, pruned_loss=0.06913, ctc_loss=0.1202, over 3298581.71 frames. ], batch size: 38, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:11:51,762 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2901416.0, ans=0.1 2023-10-09 23:11:57,085 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2901416.0, ans=0.125 2023-10-09 23:11:58,508 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.49 vs. limit=15.0 2023-10-09 23:12:17,141 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=2901509.3333333335, ans=0.125 2023-10-09 23:12:22,614 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2901509.3333333335, ans=0.125 2023-10-09 23:12:34,067 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2901556.0, ans=0.2 2023-10-09 23:12:40,633 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2901602.6666666665, ans=0.05 2023-10-09 23:12:49,889 INFO [train.py:1031] (2/4) Epoch 14, batch 37050, loss[loss=0.2205, simple_loss=0.2694, pruned_loss=0.0637, ctc_loss=0.1103, over 16705.00 frames. ], tot_loss[loss=0.2354, simple_loss=0.2871, pruned_loss=0.06813, ctc_loss=0.1187, over 3297452.39 frames. ], batch size: 272, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:13:10,613 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2901696.0, ans=0.1 2023-10-09 23:13:25,473 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2901742.6666666665, ans=0.125 2023-10-09 23:13:31,556 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.430e+02 3.263e+02 3.806e+02 4.315e+02 8.340e+02, threshold=7.611e+02, percent-clipped=1.0 2023-10-09 23:13:51,986 INFO [train.py:1031] (2/4) Epoch 14, batch 37100, loss[loss=0.2065, simple_loss=0.2551, pruned_loss=0.05899, ctc_loss=0.09992, over 16748.00 frames. ], tot_loss[loss=0.2295, simple_loss=0.2794, pruned_loss=0.06657, ctc_loss=0.1161, over 3305565.11 frames. ], batch size: 95, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:14:12,067 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2901929.3333333335, ans=0.125 2023-10-09 23:14:18,553 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2901976.0, ans=0.125 2023-10-09 23:14:21,773 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.74 vs. limit=15.0 2023-10-09 23:14:48,922 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.83 vs. limit=12.0 2023-10-09 23:14:53,066 INFO [train.py:1031] (2/4) Epoch 14, batch 37150, loss[loss=0.2024, simple_loss=0.2514, pruned_loss=0.05756, ctc_loss=0.09567, over 16755.00 frames. ], tot_loss[loss=0.2245, simple_loss=0.2736, pruned_loss=0.06506, ctc_loss=0.113, over 3309561.65 frames. ], batch size: 130, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:15:00,223 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2902116.0, ans=0.0 2023-10-09 23:15:22,455 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2902209.3333333335, ans=0.0 2023-10-09 23:15:26,376 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2902209.3333333335, ans=0.125 2023-10-09 23:15:34,535 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.232e+02 3.053e+02 3.584e+02 4.083e+02 7.481e+02, threshold=7.169e+02, percent-clipped=0.0 2023-10-09 23:15:54,258 INFO [train.py:1031] (2/4) Epoch 14, batch 37200, loss[loss=0.2309, simple_loss=0.3264, pruned_loss=0.04949, ctc_loss=0.09083, over 15190.00 frames. ], tot_loss[loss=0.2245, simple_loss=0.2773, pruned_loss=0.06367, ctc_loss=0.111, over 3293955.04 frames. ], batch size: 526, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:15:59,259 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2902349.3333333335, ans=0.125 2023-10-09 23:16:05,121 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2902396.0, ans=0.125 2023-10-09 23:16:09,813 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2902396.0, ans=0.1 2023-10-09 23:16:18,318 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2902442.6666666665, ans=0.2 2023-10-09 23:16:20,036 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2902442.6666666665, ans=0.0 2023-10-09 23:16:41,381 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=2902536.0, ans=0.2 2023-10-09 23:16:43,580 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2902536.0, ans=0.125 2023-10-09 23:16:53,929 INFO [train.py:1031] (2/4) Epoch 14, batch 37250, loss[loss=0.2314, simple_loss=0.3034, pruned_loss=0.05955, ctc_loss=0.1006, over 16339.00 frames. ], tot_loss[loss=0.2238, simple_loss=0.2809, pruned_loss=0.06174, ctc_loss=0.108, over 3295447.16 frames. ], batch size: 70, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:17:03,744 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2023-10-09 23:17:06,941 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.93 vs. limit=15.0 2023-10-09 23:17:09,499 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2902629.3333333335, ans=0.125 2023-10-09 23:17:21,507 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=2902676.0, ans=0.2 2023-10-09 23:17:32,662 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=2902722.6666666665, ans=0.0 2023-10-09 23:17:36,499 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.338e+02 2.936e+02 3.384e+02 3.917e+02 6.225e+02, threshold=6.767e+02, percent-clipped=0.0 2023-10-09 23:17:36,811 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2902722.6666666665, ans=0.0 2023-10-09 23:17:48,181 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2902769.3333333335, ans=0.1 2023-10-09 23:17:49,202 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=2902769.3333333335, ans=0.125 2023-10-09 23:17:54,217 INFO [train.py:1031] (2/4) Epoch 14, batch 37300, loss[loss=0.2051, simple_loss=0.2875, pruned_loss=0.04545, ctc_loss=0.07969, over 16858.00 frames. ], tot_loss[loss=0.2209, simple_loss=0.2792, pruned_loss=0.06018, ctc_loss=0.1054, over 3304274.89 frames. ], batch size: 228, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:17:56,667 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=2902816.0, ans=0.2 2023-10-09 23:18:16,732 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2902862.6666666665, ans=0.125 2023-10-09 23:18:25,957 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2902909.3333333335, ans=0.2 2023-10-09 23:18:32,096 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=11.41 vs. limit=22.5 2023-10-09 23:18:52,765 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2903002.6666666665, ans=0.125 2023-10-09 23:18:55,640 INFO [train.py:1031] (2/4) Epoch 14, batch 37350, loss[loss=0.175, simple_loss=0.2429, pruned_loss=0.03956, ctc_loss=0.06971, over 11693.00 frames. ], tot_loss[loss=0.2188, simple_loss=0.2802, pruned_loss=0.05824, ctc_loss=0.102, over 3289891.92 frames. ], batch size: 39, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:18:58,066 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2903049.3333333335, ans=0.0 2023-10-09 23:19:05,885 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=2903049.3333333335, ans=0.0 2023-10-09 23:19:10,963 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2903096.0, ans=0.125 2023-10-09 23:19:15,757 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=2903096.0, ans=0.125 2023-10-09 23:19:38,026 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.348e+02 2.949e+02 3.528e+02 4.105e+02 1.147e+03, threshold=7.057e+02, percent-clipped=0.0 2023-10-09 23:19:45,322 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=2903236.0, ans=0.125 2023-10-09 23:19:54,563 INFO [train.py:1031] (2/4) Epoch 14, batch 37400, loss[loss=0.2276, simple_loss=0.2695, pruned_loss=0.0679, ctc_loss=0.1247, over 16806.00 frames. ], tot_loss[loss=0.217, simple_loss=0.2758, pruned_loss=0.05862, ctc_loss=0.1024, over 3289879.25 frames. ], batch size: 329, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:20:16,419 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2903329.3333333335, ans=0.125 2023-10-09 23:20:17,568 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2903376.0, ans=0.2 2023-10-09 23:20:52,024 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2903469.3333333335, ans=0.125 2023-10-09 23:20:54,833 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=2903516.0, ans=0.125 2023-10-09 23:20:55,594 INFO [train.py:1031] (2/4) Epoch 14, batch 37450, loss[loss=0.1915, simple_loss=0.2605, pruned_loss=0.04428, ctc_loss=0.08474, over 16800.00 frames. ], tot_loss[loss=0.2158, simple_loss=0.2745, pruned_loss=0.0582, ctc_loss=0.1019, over 3295723.05 frames. ], batch size: 188, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:21:07,153 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.31 vs. limit=6.0 2023-10-09 23:21:17,935 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2903562.6666666665, ans=0.125 2023-10-09 23:21:23,934 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2903609.3333333335, ans=0.2 2023-10-09 23:21:41,546 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.328e+02 2.982e+02 3.878e+02 4.493e+02 7.805e+02, threshold=7.755e+02, percent-clipped=2.0 2023-10-09 23:21:58,571 INFO [train.py:1031] (2/4) Epoch 14, batch 37500, loss[loss=0.2326, simple_loss=0.3087, pruned_loss=0.05765, ctc_loss=0.1031, over 16880.00 frames. ], tot_loss[loss=0.2213, simple_loss=0.28, pruned_loss=0.0601, ctc_loss=0.1058, over 3293689.76 frames. ], batch size: 242, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:22:16,661 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 23:22:39,190 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=2903889.3333333335, ans=0.2 2023-10-09 23:22:48,588 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.46 vs. limit=15.0 2023-10-09 23:22:56,199 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.75 vs. limit=10.0 2023-10-09 23:22:58,595 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2903982.6666666665, ans=0.125 2023-10-09 23:22:59,329 INFO [train.py:1031] (2/4) Epoch 14, batch 37550, loss[loss=0.1638, simple_loss=0.2287, pruned_loss=0.03694, ctc_loss=0.06262, over 16781.00 frames. ], tot_loss[loss=0.2173, simple_loss=0.2782, pruned_loss=0.05772, ctc_loss=0.1022, over 3292179.30 frames. ], batch size: 164, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:23:22,177 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2904076.0, ans=0.1 2023-10-09 23:23:35,622 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=2904122.6666666665, ans=0.125 2023-10-09 23:23:38,432 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=2904122.6666666665, ans=0.0 2023-10-09 23:23:46,184 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.135e+02 2.897e+02 3.320e+02 4.036e+02 7.809e+02, threshold=6.640e+02, percent-clipped=1.0 2023-10-09 23:24:00,648 INFO [train.py:1031] (2/4) Epoch 14, batch 37600, loss[loss=0.1992, simple_loss=0.2502, pruned_loss=0.05378, ctc_loss=0.1014, over 16797.00 frames. ], tot_loss[loss=0.2152, simple_loss=0.2735, pruned_loss=0.05802, ctc_loss=0.1022, over 3281684.17 frames. ], batch size: 202, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:24:05,229 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=2904216.0, ans=0.125 2023-10-09 23:24:08,401 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-10-09 23:24:33,473 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.08 vs. limit=22.5 2023-10-09 23:24:50,506 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=2904402.6666666665, ans=10.0 2023-10-09 23:24:59,293 INFO [train.py:1031] (2/4) Epoch 14, batch 37650, loss[loss=0.2097, simple_loss=0.264, pruned_loss=0.05771, ctc_loss=0.09996, over 16787.00 frames. ], tot_loss[loss=0.217, simple_loss=0.2729, pruned_loss=0.05965, ctc_loss=0.1045, over 3276307.85 frames. ], batch size: 102, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:25:13,108 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.62 vs. limit=10.0 2023-10-09 23:25:48,254 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.460e+02 3.454e+02 4.118e+02 4.727e+02 1.151e+03, threshold=8.236e+02, percent-clipped=7.0 2023-10-09 23:26:01,812 INFO [train.py:1031] (2/4) Epoch 14, batch 37700, loss[loss=0.1848, simple_loss=0.268, pruned_loss=0.03656, ctc_loss=0.071, over 16852.00 frames. ], tot_loss[loss=0.2182, simple_loss=0.2755, pruned_loss=0.05958, ctc_loss=0.1043, over 3275386.47 frames. ], batch size: 215, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:26:55,655 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2904869.3333333335, ans=0.125 2023-10-09 23:27:04,452 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2904916.0, ans=0.125 2023-10-09 23:27:05,162 INFO [train.py:1031] (2/4) Epoch 14, batch 37750, loss[loss=0.1942, simple_loss=0.2517, pruned_loss=0.05109, ctc_loss=0.08645, over 16937.00 frames. ], tot_loss[loss=0.2129, simple_loss=0.2749, pruned_loss=0.0557, ctc_loss=0.09856, over 3274694.66 frames. ], batch size: 82, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:27:21,551 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.50 vs. limit=10.0 2023-10-09 23:27:42,970 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2905056.0, ans=0.1 2023-10-09 23:27:56,212 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.100e+02 2.858e+02 3.601e+02 4.405e+02 1.102e+03, threshold=7.202e+02, percent-clipped=1.0 2023-10-09 23:28:07,550 INFO [train.py:1031] (2/4) Epoch 14, batch 37800, loss[loss=0.2437, simple_loss=0.3378, pruned_loss=0.0535, ctc_loss=0.1065, over 16261.00 frames. ], tot_loss[loss=0.2209, simple_loss=0.2836, pruned_loss=0.05834, ctc_loss=0.1038, over 3269196.63 frames. ], batch size: 463, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:28:17,349 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2905149.3333333335, ans=0.125 2023-10-09 23:28:18,490 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2905149.3333333335, ans=0.1 2023-10-09 23:28:30,953 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2905196.0, ans=0.1 2023-10-09 23:28:34,629 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2905242.6666666665, ans=0.0 2023-10-09 23:28:35,820 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2905242.6666666665, ans=0.125 2023-10-09 23:28:40,442 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.58 vs. limit=15.0 2023-10-09 23:28:51,319 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=2905289.3333333335, ans=0.2 2023-10-09 23:29:03,746 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2905336.0, ans=0.1 2023-10-09 23:29:08,605 INFO [train.py:1031] (2/4) Epoch 14, batch 37850, loss[loss=0.2137, simple_loss=0.2835, pruned_loss=0.05301, ctc_loss=0.09478, over 16976.00 frames. ], tot_loss[loss=0.2234, simple_loss=0.2898, pruned_loss=0.05782, ctc_loss=0.1035, over 3284405.72 frames. ], batch size: 216, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:29:10,583 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2905382.6666666665, ans=0.0 2023-10-09 23:29:10,865 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.21 vs. limit=10.0 2023-10-09 23:29:21,783 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2905429.3333333335, ans=0.125 2023-10-09 23:29:47,124 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2905522.6666666665, ans=0.2 2023-10-09 23:29:57,452 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2905522.6666666665, ans=0.125 2023-10-09 23:30:00,198 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2905569.3333333335, ans=0.1 2023-10-09 23:30:00,965 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.344e+02 3.190e+02 3.752e+02 4.348e+02 7.334e+02, threshold=7.503e+02, percent-clipped=1.0 2023-10-09 23:30:13,338 INFO [train.py:1031] (2/4) Epoch 14, batch 37900, loss[loss=0.2513, simple_loss=0.2999, pruned_loss=0.07444, ctc_loss=0.1346, over 16281.00 frames. ], tot_loss[loss=0.2284, simple_loss=0.2921, pruned_loss=0.06074, ctc_loss=0.108, over 3284031.87 frames. ], batch size: 463, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:30:25,870 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2905662.6666666665, ans=0.2 2023-10-09 23:30:47,561 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.48 vs. limit=15.0 2023-10-09 23:30:52,222 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2905756.0, ans=0.2 2023-10-09 23:31:01,194 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.84 vs. limit=22.5 2023-10-09 23:31:13,598 INFO [train.py:1031] (2/4) Epoch 14, batch 37950, loss[loss=0.2186, simple_loss=0.2666, pruned_loss=0.06363, ctc_loss=0.1086, over 16766.00 frames. ], tot_loss[loss=0.2316, simple_loss=0.2921, pruned_loss=0.06317, ctc_loss=0.1118, over 3282265.30 frames. ], batch size: 164, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:31:50,242 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=2905989.3333333335, ans=0.125 2023-10-09 23:31:57,390 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2905989.3333333335, ans=0.2 2023-10-09 23:32:05,288 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.613e+02 3.279e+02 3.863e+02 4.623e+02 8.979e+02, threshold=7.726e+02, percent-clipped=3.0 2023-10-09 23:32:11,499 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=2906036.0, ans=0.125 2023-10-09 23:32:14,788 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=2906082.6666666665, ans=0.0 2023-10-09 23:32:15,503 INFO [train.py:1031] (2/4) Epoch 14, batch 38000, loss[loss=0.2196, simple_loss=0.2708, pruned_loss=0.06372, ctc_loss=0.1025, over 16927.00 frames. ], tot_loss[loss=0.2295, simple_loss=0.286, pruned_loss=0.06392, ctc_loss=0.1128, over 3269951.77 frames. ], batch size: 78, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:32:23,928 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=2906082.6666666665, ans=0.125 2023-10-09 23:32:27,680 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=2906129.3333333335, ans=0.2 2023-10-09 23:32:34,131 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2906129.3333333335, ans=0.0 2023-10-09 23:32:49,943 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=2906176.0, ans=0.0 2023-10-09 23:32:52,114 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2906222.6666666665, ans=0.125 2023-10-09 23:33:04,053 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=2906269.3333333335, ans=0.05 2023-10-09 23:33:13,250 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2906269.3333333335, ans=0.0 2023-10-09 23:33:16,626 INFO [train.py:1031] (2/4) Epoch 14, batch 38050, loss[loss=0.2362, simple_loss=0.2911, pruned_loss=0.0678, ctc_loss=0.114, over 16891.00 frames. ], tot_loss[loss=0.2286, simple_loss=0.2835, pruned_loss=0.06421, ctc_loss=0.1133, over 3276528.57 frames. ], batch size: 90, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:33:34,762 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.63 vs. limit=15.0 2023-10-09 23:33:39,344 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.55 vs. limit=15.0 2023-10-09 23:33:54,593 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.02 vs. limit=10.0 2023-10-09 23:33:59,757 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2906456.0, ans=0.05 2023-10-09 23:34:10,176 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.516e+02 3.275e+02 3.696e+02 4.420e+02 6.425e+02, threshold=7.391e+02, percent-clipped=0.0 2023-10-09 23:34:18,444 INFO [train.py:1031] (2/4) Epoch 14, batch 38100, loss[loss=0.2325, simple_loss=0.2972, pruned_loss=0.06268, ctc_loss=0.1064, over 16784.00 frames. ], tot_loss[loss=0.2327, simple_loss=0.2868, pruned_loss=0.06605, ctc_loss=0.1164, over 3282502.61 frames. ], batch size: 164, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:34:21,449 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2906549.3333333335, ans=0.1 2023-10-09 23:34:27,043 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2906549.3333333335, ans=0.125 2023-10-09 23:34:30,937 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=2906596.0, ans=0.125 2023-10-09 23:34:59,825 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2906689.3333333335, ans=0.125 2023-10-09 23:35:05,811 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2906689.3333333335, ans=0.1 2023-10-09 23:35:23,913 INFO [train.py:1031] (2/4) Epoch 14, batch 38150, loss[loss=0.2639, simple_loss=0.3768, pruned_loss=0.05278, ctc_loss=0.1133, over 15106.00 frames. ], tot_loss[loss=0.244, simple_loss=0.2991, pruned_loss=0.06968, ctc_loss=0.1236, over 3282767.43 frames. ], batch size: 526, lr: 2.52e-03, grad_scale: 1.0 2023-10-09 23:35:30,867 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=2906782.6666666665, ans=0.125 2023-10-09 23:35:33,717 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2906782.6666666665, ans=0.1 2023-10-09 23:36:00,107 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.05 vs. limit=10.0 2023-10-09 23:36:22,910 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.043e+02 3.850e+02 4.496e+02 5.551e+02 1.259e+03, threshold=8.992e+02, percent-clipped=8.0 2023-10-09 23:36:28,325 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=2907016.0, ans=0.0 2023-10-09 23:36:29,638 INFO [train.py:1031] (2/4) Epoch 14, batch 38200, loss[loss=0.2731, simple_loss=0.3289, pruned_loss=0.08022, ctc_loss=0.1423, over 16702.00 frames. ], tot_loss[loss=0.2492, simple_loss=0.3044, pruned_loss=0.0716, ctc_loss=0.1269, over 3283125.54 frames. ], batch size: 272, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:36:31,787 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=2907016.0, ans=0.2 2023-10-09 23:36:35,126 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2907016.0, ans=0.0 2023-10-09 23:37:00,415 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2907109.3333333335, ans=0.0 2023-10-09 23:37:05,464 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 23:37:09,353 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2907156.0, ans=0.0 2023-10-09 23:37:19,602 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2907156.0, ans=0.5 2023-10-09 23:37:26,877 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.25 vs. limit=15.0 2023-10-09 23:37:31,431 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=2907202.6666666665, ans=0.2 2023-10-09 23:37:33,238 INFO [train.py:1031] (2/4) Epoch 14, batch 38250, loss[loss=0.216, simple_loss=0.274, pruned_loss=0.05909, ctc_loss=0.09948, over 16856.00 frames. ], tot_loss[loss=0.2471, simple_loss=0.306, pruned_loss=0.0694, ctc_loss=0.1235, over 3288531.27 frames. ], batch size: 90, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:38:29,118 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.523e+02 3.325e+02 3.787e+02 4.498e+02 1.020e+03, threshold=7.574e+02, percent-clipped=1.0 2023-10-09 23:38:34,793 INFO [train.py:1031] (2/4) Epoch 14, batch 38300, loss[loss=0.2102, simple_loss=0.2591, pruned_loss=0.05997, ctc_loss=0.1035, over 16706.00 frames. ], tot_loss[loss=0.2444, simple_loss=0.3013, pruned_loss=0.06926, ctc_loss=0.1225, over 3294903.99 frames. ], batch size: 151, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:38:45,438 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=2907482.6666666665, ans=0.04949747468305833 2023-10-09 23:38:53,128 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.78 vs. limit=15.0 2023-10-09 23:39:27,719 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.14 vs. limit=15.0 2023-10-09 23:39:36,988 INFO [train.py:1031] (2/4) Epoch 14, batch 38350, loss[loss=0.269, simple_loss=0.3429, pruned_loss=0.07088, ctc_loss=0.1335, over 15260.00 frames. ], tot_loss[loss=0.2465, simple_loss=0.3041, pruned_loss=0.06975, ctc_loss=0.1233, over 3294896.45 frames. ], batch size: 527, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:39:50,034 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=2907762.6666666665, ans=0.2 2023-10-09 23:39:56,993 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2907762.6666666665, ans=0.125 2023-10-09 23:40:01,950 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=2907809.3333333335, ans=0.05 2023-10-09 23:40:19,999 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.57 vs. limit=15.0 2023-10-09 23:40:27,088 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2907902.6666666665, ans=0.0 2023-10-09 23:40:29,975 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=2907902.6666666665, ans=0.2 2023-10-09 23:40:35,688 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.536e+02 3.571e+02 4.561e+02 5.514e+02 1.040e+03, threshold=9.121e+02, percent-clipped=3.0 2023-10-09 23:40:41,304 INFO [train.py:1031] (2/4) Epoch 14, batch 38400, loss[loss=0.2602, simple_loss=0.3204, pruned_loss=0.07573, ctc_loss=0.1215, over 16971.00 frames. ], tot_loss[loss=0.2507, simple_loss=0.3078, pruned_loss=0.07148, ctc_loss=0.1264, over 3299177.27 frames. ], batch size: 91, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:40:58,465 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2907996.0, ans=0.0 2023-10-09 23:41:09,222 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2908042.6666666665, ans=0.0 2023-10-09 23:41:44,684 INFO [train.py:1031] (2/4) Epoch 14, batch 38450, loss[loss=0.2059, simple_loss=0.2778, pruned_loss=0.04958, ctc_loss=0.08708, over 16840.00 frames. ], tot_loss[loss=0.2488, simple_loss=0.3064, pruned_loss=0.07062, ctc_loss=0.1248, over 3301448.75 frames. ], batch size: 95, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:42:21,733 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=2908322.6666666665, ans=0.125 2023-10-09 23:42:22,867 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2908322.6666666665, ans=0.0 2023-10-09 23:42:38,560 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2908369.3333333335, ans=0.125 2023-10-09 23:42:42,962 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.362e+02 3.189e+02 3.714e+02 4.508e+02 1.225e+03, threshold=7.428e+02, percent-clipped=2.0 2023-10-09 23:42:47,041 INFO [train.py:1031] (2/4) Epoch 14, batch 38500, loss[loss=0.3332, simple_loss=0.3549, pruned_loss=0.1147, ctc_loss=0.2051, over 16501.00 frames. ], tot_loss[loss=0.2466, simple_loss=0.3057, pruned_loss=0.06927, ctc_loss=0.1225, over 3297455.69 frames. ], batch size: 416, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:42:47,370 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2908416.0, ans=0.125 2023-10-09 23:43:27,289 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.95 vs. limit=15.0 2023-10-09 23:43:35,563 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=2908602.6666666665, ans=0.125 2023-10-09 23:43:49,235 INFO [train.py:1031] (2/4) Epoch 14, batch 38550, loss[loss=0.2185, simple_loss=0.2739, pruned_loss=0.05999, ctc_loss=0.1077, over 16817.00 frames. ], tot_loss[loss=0.2466, simple_loss=0.3036, pruned_loss=0.07011, ctc_loss=0.1236, over 3294065.42 frames. ], batch size: 140, lr: 2.52e-03, grad_scale: 1.0 2023-10-09 23:43:49,476 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 23:44:12,807 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2908742.6666666665, ans=0.1 2023-10-09 23:44:27,579 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2908789.3333333335, ans=0.125 2023-10-09 23:44:45,948 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2908836.0, ans=0.125 2023-10-09 23:44:48,216 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.444e+02 3.231e+02 3.757e+02 4.445e+02 8.253e+02, threshold=7.513e+02, percent-clipped=2.0 2023-10-09 23:44:49,801 INFO [train.py:1031] (2/4) Epoch 14, batch 38600, loss[loss=0.1933, simple_loss=0.2411, pruned_loss=0.053, ctc_loss=0.09903, over 16459.00 frames. ], tot_loss[loss=0.2428, simple_loss=0.2971, pruned_loss=0.06977, ctc_loss=0.1225, over 3290004.89 frames. ], batch size: 464, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:44:52,388 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=2908882.6666666665, ans=0.125 2023-10-09 23:44:52,446 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=2908882.6666666665, ans=0.2 2023-10-09 23:44:55,086 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.52 vs. limit=15.0 2023-10-09 23:45:38,540 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2909069.3333333335, ans=0.0 2023-10-09 23:45:44,286 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.33 vs. limit=10.0 2023-10-09 23:45:51,534 INFO [train.py:1031] (2/4) Epoch 14, batch 38650, loss[loss=0.2272, simple_loss=0.2665, pruned_loss=0.06981, ctc_loss=0.1209, over 16575.00 frames. ], tot_loss[loss=0.2399, simple_loss=0.2926, pruned_loss=0.06927, ctc_loss=0.1215, over 3305291.02 frames. ], batch size: 416, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:46:00,007 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2909116.0, ans=0.125 2023-10-09 23:46:03,598 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.26 vs. limit=22.5 2023-10-09 23:46:04,187 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2909162.6666666665, ans=0.1 2023-10-09 23:46:04,199 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=2909162.6666666665, ans=0.0 2023-10-09 23:46:11,676 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-09 23:46:20,728 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2909209.3333333335, ans=0.125 2023-10-09 23:46:23,230 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.20 vs. limit=22.5 2023-10-09 23:46:27,743 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2909256.0, ans=0.1 2023-10-09 23:46:28,166 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.85 vs. limit=22.5 2023-10-09 23:46:50,552 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=7.84 vs. limit=12.0 2023-10-09 23:46:54,839 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.443e+02 3.287e+02 3.697e+02 4.587e+02 9.347e+02, threshold=7.394e+02, percent-clipped=1.0 2023-10-09 23:46:54,866 INFO [train.py:1031] (2/4) Epoch 14, batch 38700, loss[loss=0.2046, simple_loss=0.2409, pruned_loss=0.06164, ctc_loss=0.1125, over 16663.00 frames. ], tot_loss[loss=0.2381, simple_loss=0.2904, pruned_loss=0.06874, ctc_loss=0.1209, over 3309948.99 frames. ], batch size: 70, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:47:13,267 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.12 vs. limit=15.0 2023-10-09 23:47:14,454 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.85 vs. limit=6.0 2023-10-09 23:47:15,782 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2909396.0, ans=0.0 2023-10-09 23:47:38,340 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=2909489.3333333335, ans=0.0 2023-10-09 23:47:41,474 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.54 vs. limit=15.0 2023-10-09 23:47:56,598 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.27 vs. limit=12.0 2023-10-09 23:47:58,602 INFO [train.py:1031] (2/4) Epoch 14, batch 38750, loss[loss=0.2012, simple_loss=0.2814, pruned_loss=0.04445, ctc_loss=0.08032, over 16855.00 frames. ], tot_loss[loss=0.2377, simple_loss=0.2913, pruned_loss=0.06803, ctc_loss=0.12, over 3304602.81 frames. ], batch size: 228, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:48:14,314 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2909629.3333333335, ans=0.0 2023-10-09 23:48:20,872 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=2909629.3333333335, ans=0.025 2023-10-09 23:49:01,691 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=2909816.0, ans=0.0 2023-10-09 23:49:01,732 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2909816.0, ans=0.2 2023-10-09 23:49:02,385 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.164e+02 3.418e+02 4.112e+02 5.352e+02 1.046e+03, threshold=8.225e+02, percent-clipped=4.0 2023-10-09 23:49:02,412 INFO [train.py:1031] (2/4) Epoch 14, batch 38800, loss[loss=0.2413, simple_loss=0.3464, pruned_loss=0.04955, ctc_loss=0.093, over 16306.00 frames. ], tot_loss[loss=0.2344, simple_loss=0.2939, pruned_loss=0.0646, ctc_loss=0.1145, over 3305056.90 frames. ], batch size: 463, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:49:08,947 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.74 vs. limit=15.0 2023-10-09 23:49:45,267 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2909956.0, ans=0.125 2023-10-09 23:49:47,128 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2909956.0, ans=0.125 2023-10-09 23:50:03,003 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=2910002.6666666665, ans=0.95 2023-10-09 23:50:04,803 INFO [train.py:1031] (2/4) Epoch 14, batch 38850, loss[loss=0.2326, simple_loss=0.2932, pruned_loss=0.06293, ctc_loss=0.1154, over 16829.00 frames. ], tot_loss[loss=0.2348, simple_loss=0.2966, pruned_loss=0.06382, ctc_loss=0.1136, over 3296354.57 frames. ], batch size: 188, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:50:15,031 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=2910049.3333333335, ans=0.0 2023-10-09 23:50:18,910 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2910096.0, ans=0.125 2023-10-09 23:50:22,681 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=2910096.0, ans=0.2 2023-10-09 23:50:37,811 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=2910142.6666666665, ans=0.2 2023-10-09 23:50:46,685 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2910189.3333333335, ans=0.1 2023-10-09 23:51:06,342 INFO [train.py:1031] (2/4) Epoch 14, batch 38900, loss[loss=0.2224, simple_loss=0.2734, pruned_loss=0.06483, ctc_loss=0.1045, over 16769.00 frames. ], tot_loss[loss=0.2351, simple_loss=0.2943, pruned_loss=0.06493, ctc_loss=0.1153, over 3302966.97 frames. ], batch size: 151, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:51:06,677 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=2910282.6666666665, ans=0.125 2023-10-09 23:51:07,409 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2910282.6666666665, ans=0.1 2023-10-09 23:51:07,980 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.806e+02 3.456e+02 4.311e+02 5.586e+02 1.002e+03, threshold=8.621e+02, percent-clipped=2.0 2023-10-09 23:51:16,449 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2910282.6666666665, ans=0.1 2023-10-09 23:51:20,507 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=2910329.3333333335, ans=0.125 2023-10-09 23:51:28,626 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2910329.3333333335, ans=0.0 2023-10-09 23:51:48,573 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2910422.6666666665, ans=0.0 2023-10-09 23:52:09,369 INFO [train.py:1031] (2/4) Epoch 14, batch 38950, loss[loss=0.2079, simple_loss=0.2562, pruned_loss=0.05879, ctc_loss=0.1051, over 16833.00 frames. ], tot_loss[loss=0.234, simple_loss=0.2903, pruned_loss=0.06558, ctc_loss=0.1161, over 3308802.24 frames. ], batch size: 189, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:52:15,910 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=2910516.0, ans=0.2 2023-10-09 23:52:19,413 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2910516.0, ans=0.125 2023-10-09 23:52:19,466 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=2910516.0, ans=0.0 2023-10-09 23:52:24,832 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2910562.6666666665, ans=0.1 2023-10-09 23:52:33,532 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2910562.6666666665, ans=0.0 2023-10-09 23:52:48,686 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2910656.0, ans=0.1 2023-10-09 23:53:02,120 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2910702.6666666665, ans=0.125 2023-10-09 23:53:07,438 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2023-10-09 23:53:08,839 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-10-09 23:53:09,872 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2910702.6666666665, ans=0.125 2023-10-09 23:53:11,009 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2910702.6666666665, ans=0.0 2023-10-09 23:53:14,615 INFO [train.py:1031] (2/4) Epoch 14, batch 39000, loss[loss=0.2208, simple_loss=0.2821, pruned_loss=0.05947, ctc_loss=0.1013, over 16791.00 frames. ], tot_loss[loss=0.2359, simple_loss=0.2905, pruned_loss=0.06692, ctc_loss=0.1184, over 3296178.75 frames. ], batch size: 140, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:53:14,615 INFO [train.py:1054] (2/4) Computing validation loss 2023-10-09 23:53:33,417 INFO [train.py:1063] (2/4) Epoch 14, validation: loss=0.2363, simple_loss=0.3035, pruned_loss=0.06558, ctc_loss=0.09478, over 1796401.00 frames. 2023-10-09 23:53:33,417 INFO [train.py:1064] (2/4) Maximum memory allocated so far is 14591MB 2023-10-09 23:53:35,531 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.616e+02 3.267e+02 3.662e+02 4.475e+02 7.642e+02, threshold=7.323e+02, percent-clipped=0.0 2023-10-09 23:54:07,530 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=2910842.6666666665, ans=0.0 2023-10-09 23:54:14,634 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=3.34 vs. limit=10.0 2023-10-09 23:54:16,754 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.71 vs. limit=15.0 2023-10-09 23:54:34,752 INFO [train.py:1031] (2/4) Epoch 14, batch 39050, loss[loss=0.1899, simple_loss=0.2511, pruned_loss=0.04796, ctc_loss=0.08188, over 16944.00 frames. ], tot_loss[loss=0.2372, simple_loss=0.2901, pruned_loss=0.06807, ctc_loss=0.1203, over 3298608.38 frames. ], batch size: 86, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:54:41,886 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=2910982.6666666665, ans=0.125 2023-10-09 23:54:52,868 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=2911029.3333333335, ans=0.0 2023-10-09 23:55:03,063 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2911076.0, ans=0.125 2023-10-09 23:55:03,087 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2911076.0, ans=0.1 2023-10-09 23:55:35,627 INFO [train.py:1031] (2/4) Epoch 14, batch 39100, loss[loss=0.2232, simple_loss=0.2742, pruned_loss=0.0634, ctc_loss=0.1134, over 16150.00 frames. ], tot_loss[loss=0.2309, simple_loss=0.2821, pruned_loss=0.06648, ctc_loss=0.117, over 3294127.33 frames. ], batch size: 463, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:55:39,883 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.476e+02 3.243e+02 3.619e+02 4.225e+02 8.592e+02, threshold=7.239e+02, percent-clipped=2.0 2023-10-09 23:55:46,832 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.46 vs. limit=10.0 2023-10-09 23:55:48,800 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2911262.6666666665, ans=0.125 2023-10-09 23:55:55,328 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.77 vs. limit=22.5 2023-10-09 23:56:18,829 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=2911356.0, ans=0.125 2023-10-09 23:56:26,445 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2911402.6666666665, ans=0.1 2023-10-09 23:56:29,230 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2911402.6666666665, ans=0.125 2023-10-09 23:56:30,196 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=2911402.6666666665, ans=0.015 2023-10-09 23:56:33,239 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=2911402.6666666665, ans=0.125 2023-10-09 23:56:39,874 INFO [train.py:1031] (2/4) Epoch 14, batch 39150, loss[loss=0.2673, simple_loss=0.3472, pruned_loss=0.06756, ctc_loss=0.1305, over 16909.00 frames. ], tot_loss[loss=0.2342, simple_loss=0.2878, pruned_loss=0.06672, ctc_loss=0.1177, over 3296669.71 frames. ], batch size: 258, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:57:16,477 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-09 23:57:29,193 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2911589.3333333335, ans=0.125 2023-10-09 23:57:33,413 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=2911636.0, ans=0.125 2023-10-09 23:57:44,713 INFO [train.py:1031] (2/4) Epoch 14, batch 39200, loss[loss=0.169, simple_loss=0.2347, pruned_loss=0.03831, ctc_loss=0.06663, over 16672.00 frames. ], tot_loss[loss=0.2367, simple_loss=0.2947, pruned_loss=0.06603, ctc_loss=0.1164, over 3296366.45 frames. ], batch size: 102, lr: 2.52e-03, grad_scale: 4.0 2023-10-09 23:57:49,007 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.698e+02 3.956e+02 5.075e+02 6.707e+02 1.311e+03, threshold=1.015e+03, percent-clipped=19.0 2023-10-09 23:58:29,982 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.85 vs. limit=12.0 2023-10-09 23:58:47,253 INFO [train.py:1031] (2/4) Epoch 14, batch 39250, loss[loss=0.1864, simple_loss=0.2409, pruned_loss=0.05, ctc_loss=0.07961, over 16817.00 frames. ], tot_loss[loss=0.2325, simple_loss=0.2911, pruned_loss=0.06448, ctc_loss=0.1122, over 3297136.92 frames. ], batch size: 164, lr: 2.52e-03, grad_scale: 2.0 2023-10-09 23:59:20,338 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.89 vs. limit=15.0 2023-10-09 23:59:24,881 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2912009.3333333335, ans=0.125 2023-10-09 23:59:53,293 INFO [train.py:1031] (2/4) Epoch 14, batch 39300, loss[loss=0.1935, simple_loss=0.2559, pruned_loss=0.04963, ctc_loss=0.0798, over 16689.00 frames. ], tot_loss[loss=0.2275, simple_loss=0.287, pruned_loss=0.06245, ctc_loss=0.108, over 3290346.25 frames. ], batch size: 151, lr: 2.52e-03, grad_scale: 4.0 2023-10-10 00:00:00,609 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.735e+02 3.228e+02 3.774e+02 4.947e+02 8.395e+02, threshold=7.547e+02, percent-clipped=0.0 2023-10-10 00:00:16,409 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2912196.0, ans=0.0 2023-10-10 00:00:41,851 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2912289.3333333335, ans=0.125 2023-10-10 00:00:46,251 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2912336.0, ans=0.125 2023-10-10 00:00:57,813 INFO [train.py:1031] (2/4) Epoch 14, batch 39350, loss[loss=0.1991, simple_loss=0.281, pruned_loss=0.04389, ctc_loss=0.07334, over 16650.00 frames. ], tot_loss[loss=0.2242, simple_loss=0.2864, pruned_loss=0.06014, ctc_loss=0.1042, over 3282846.46 frames. ], batch size: 151, lr: 2.52e-03, grad_scale: 2.0 2023-10-10 00:00:58,158 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-10 00:01:20,904 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=2912476.0, ans=0.0 2023-10-10 00:01:59,507 INFO [train.py:1031] (2/4) Epoch 14, batch 39400, loss[loss=0.2088, simple_loss=0.264, pruned_loss=0.05773, ctc_loss=0.09572, over 16698.00 frames. ], tot_loss[loss=0.2234, simple_loss=0.2859, pruned_loss=0.05961, ctc_loss=0.1039, over 3287315.94 frames. ], batch size: 102, lr: 2.52e-03, grad_scale: 2.0 2023-10-10 00:02:00,907 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2912616.0, ans=0.2 2023-10-10 00:02:03,102 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=2912616.0, ans=0.125 2023-10-10 00:02:06,854 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.362e+02 3.092e+02 4.162e+02 5.159e+02 1.181e+03, threshold=8.323e+02, percent-clipped=5.0 2023-10-10 00:02:18,433 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.28 vs. limit=15.0 2023-10-10 00:02:38,932 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=2912756.0, ans=0.2 2023-10-10 00:02:52,979 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=2912802.6666666665, ans=0.2 2023-10-10 00:02:59,393 INFO [train.py:1031] (2/4) Epoch 14, batch 39450, loss[loss=0.161, simple_loss=0.2213, pruned_loss=0.03785, ctc_loss=0.06251, over 16753.00 frames. ], tot_loss[loss=0.2202, simple_loss=0.2804, pruned_loss=0.05928, ctc_loss=0.1035, over 3293477.70 frames. ], batch size: 130, lr: 2.51e-03, grad_scale: 1.0 2023-10-10 00:03:14,935 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2912896.0, ans=0.5 2023-10-10 00:03:32,935 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2912942.6666666665, ans=0.1 2023-10-10 00:03:37,730 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=2912989.3333333335, ans=0.1 2023-10-10 00:03:45,387 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=2912989.3333333335, ans=15.0 2023-10-10 00:04:00,455 INFO [train.py:1031] (2/4) Epoch 14, batch 39500, loss[loss=0.1977, simple_loss=0.2663, pruned_loss=0.04744, ctc_loss=0.08539, over 16530.00 frames. ], tot_loss[loss=0.2099, simple_loss=0.2712, pruned_loss=0.05506, ctc_loss=0.09633, over 3293621.66 frames. ], batch size: 350, lr: 2.51e-03, grad_scale: 2.0 2023-10-10 00:04:02,515 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2913082.6666666665, ans=0.0 2023-10-10 00:04:04,434 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.41 vs. limit=15.0 2023-10-10 00:04:04,981 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=2913082.6666666665, ans=0.125 2023-10-10 00:04:09,901 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=2913082.6666666665, ans=0.125 2023-10-10 00:04:10,626 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 1.925e+02 2.656e+02 3.168e+02 3.988e+02 1.383e+03, threshold=6.335e+02, percent-clipped=1.0 2023-10-10 00:04:25,952 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2913176.0, ans=0.0 2023-10-10 00:05:01,683 INFO [train.py:1031] (2/4) Epoch 14, batch 39550, loss[loss=0.2366, simple_loss=0.2865, pruned_loss=0.06851, ctc_loss=0.124, over 16580.00 frames. ], tot_loss[loss=0.2125, simple_loss=0.2718, pruned_loss=0.05671, ctc_loss=0.09911, over 3306473.21 frames. ], batch size: 416, lr: 2.51e-03, grad_scale: 2.0 2023-10-10 00:05:03,650 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2913316.0, ans=0.125 2023-10-10 00:05:19,999 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=2913362.6666666665, ans=0.0 2023-10-10 00:05:28,080 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2913409.3333333335, ans=0.2 2023-10-10 00:05:28,147 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=2913409.3333333335, ans=0.0 2023-10-10 00:05:31,212 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=2913409.3333333335, ans=0.125 2023-10-10 00:05:44,023 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2913456.0, ans=0.125 2023-10-10 00:05:55,459 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=2913502.6666666665, ans=0.125 2023-10-10 00:06:03,879 INFO [train.py:1031] (2/4) Epoch 14, batch 39600, loss[loss=0.256, simple_loss=0.3195, pruned_loss=0.07185, ctc_loss=0.1219, over 16800.00 frames. ], tot_loss[loss=0.2105, simple_loss=0.2725, pruned_loss=0.05497, ctc_loss=0.0965, over 3303143.37 frames. ], batch size: 95, lr: 2.51e-03, grad_scale: 4.0 2023-10-10 00:06:13,821 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.291e+02 3.076e+02 3.392e+02 3.895e+02 1.156e+03, threshold=6.785e+02, percent-clipped=2.0 2023-10-10 00:06:19,596 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.40 vs. limit=15.0 2023-10-10 00:06:51,934 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-10 00:06:53,561 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.15 vs. limit=15.0 2023-10-10 00:07:03,945 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=2913736.0, ans=0.0 2023-10-10 00:07:06,372 INFO [train.py:1031] (2/4) Epoch 14, batch 39650, loss[loss=0.2843, simple_loss=0.3309, pruned_loss=0.08787, ctc_loss=0.1548, over 16782.00 frames. ], tot_loss[loss=0.2185, simple_loss=0.2786, pruned_loss=0.05867, ctc_loss=0.1025, over 3303378.01 frames. ], batch size: 329, lr: 2.51e-03, grad_scale: 2.0 2023-10-10 00:07:21,957 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2913829.3333333335, ans=0.125 2023-10-10 00:07:30,329 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2913829.3333333335, ans=0.0 2023-10-10 00:07:39,324 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.18 vs. limit=15.0 2023-10-10 00:07:54,679 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2913922.6666666665, ans=0.125 2023-10-10 00:08:07,308 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2913969.3333333335, ans=0.125 2023-10-10 00:08:09,798 INFO [train.py:1031] (2/4) Epoch 14, batch 39700, loss[loss=0.2882, simple_loss=0.3323, pruned_loss=0.09297, ctc_loss=0.1451, over 16641.00 frames. ], tot_loss[loss=0.2279, simple_loss=0.2857, pruned_loss=0.06307, ctc_loss=0.1097, over 3303869.76 frames. ], batch size: 110, lr: 2.51e-03, grad_scale: 4.0 2023-10-10 00:08:17,408 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2914016.0, ans=0.0 2023-10-10 00:08:18,448 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=2914016.0, ans=0.0 2023-10-10 00:08:21,353 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.803e+02 3.745e+02 4.250e+02 5.439e+02 1.201e+03, threshold=8.500e+02, percent-clipped=8.0 2023-10-10 00:08:25,580 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-10 00:08:27,301 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2914062.6666666665, ans=0.125 2023-10-10 00:08:28,978 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-10-10 00:08:32,668 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=2914062.6666666665, ans=0.2 2023-10-10 00:08:43,100 INFO [scaling.py:1069] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-10-10 00:08:58,301 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=2914156.0, ans=0.125 2023-10-10 00:09:05,089 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=2914202.6666666665, ans=0.2 2023-10-10 00:09:13,571 INFO [train.py:1031] (2/4) Epoch 14, batch 39750, loss[loss=0.2411, simple_loss=0.2695, pruned_loss=0.07843, ctc_loss=0.1397, over 16535.00 frames. ], tot_loss[loss=0.2307, simple_loss=0.2859, pruned_loss=0.06514, ctc_loss=0.1133, over 3302167.77 frames. ], batch size: 352, lr: 2.51e-03, grad_scale: 2.0 2023-10-10 00:09:20,345 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2914249.3333333335, ans=0.125 2023-10-10 00:09:26,637 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=2914296.0, ans=0.0 2023-10-10 00:09:26,755 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2914296.0, ans=0.125 2023-10-10 00:09:28,184 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.35 vs. limit=15.0 2023-10-10 00:09:34,107 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=2914296.0, ans=0.0 2023-10-10 00:09:46,855 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=2914342.6666666665, ans=0.0 2023-10-10 00:09:47,919 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=2914342.6666666665, ans=0.025 2023-10-10 00:09:49,699 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=2914389.3333333335, ans=0.2 2023-10-10 00:09:58,521 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=2914389.3333333335, ans=0.125 2023-10-10 00:10:12,038 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2914436.0, ans=0.0 2023-10-10 00:10:13,893 INFO [train.py:1031] (2/4) Epoch 14, batch 39800, loss[loss=0.1792, simple_loss=0.2322, pruned_loss=0.04785, ctc_loss=0.07631, over 16724.00 frames. ], tot_loss[loss=0.226, simple_loss=0.2791, pruned_loss=0.06419, ctc_loss=0.1112, over 3301136.48 frames. ], batch size: 188, lr: 2.51e-03, grad_scale: 4.0 2023-10-10 00:10:20,692 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=2914482.6666666665, ans=0.2 2023-10-10 00:10:26,704 INFO [optim.py:471] (2/4) Clipping_scale=2.0, grad-norm quartiles 2.361e+02 3.167e+02 3.567e+02 4.087e+02 1.118e+03, threshold=7.135e+02, percent-clipped=1.0 2023-10-10 00:10:43,675 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=2914576.0, ans=0.125 2023-10-10 00:11:03,417 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=2914669.3333333335, ans=0.0 2023-10-10 00:11:07,236 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=2914669.3333333335, ans=0.125 2023-10-10 00:11:10,705 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=2914669.3333333335, ans=0.125 2023-10-10 00:11:12,993 INFO [scaling.py:979] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.92 vs. limit=22.5 2023-10-10 00:11:15,154 INFO [train.py:1031] (2/4) Epoch 14, batch 39850, loss[loss=0.1954, simple_loss=0.262, pruned_loss=0.04876, ctc_loss=0.07826, over 16907.00 frames. ], tot_loss[loss=0.2221, simple_loss=0.2728, pruned_loss=0.06363, ctc_loss=0.1104, over 3306578.99 frames. ], batch size: 78, lr: 2.51e-03, grad_scale: 4.0 2023-10-10 00:11:15,532 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=2914716.0, ans=0.125 2023-10-10 00:11:22,911 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=2914716.0, ans=0.0 2023-10-10 00:11:30,044 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=2914762.6666666665, ans=0.125 2023-10-10 00:11:46,765 INFO [scaling.py:199] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2914809.3333333335, ans=0.0