|
2024-03-09 17:05:59,774 INFO [train.py:1065] (3/4) Training started |
|
2024-03-09 17:05:59,774 INFO [train.py:1075] (3/4) Device: cuda:3 |
|
2024-03-09 17:05:59,855 INFO [lexicon.py:168] (3/4) Loading pre-compiled data/lang_char/Linv.pt |
|
2024-03-09 17:05:59,869 INFO [train.py:1086] (3/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '2989b0b1186fa6022932804f5b39fbb2781ebf42', 'k2-git-date': 'Fri Nov 24 11:34:10 2023', 'lhotse-version': '1.22.0.dev+git.d8ed1bbb.dirty', 'torch-version': '1.11.0+cu102', 'torch-cuda-available': True, 'torch-cuda-version': '10.2', 'python-version': '3.9', 'icefall-git-branch': 'dev/mdcc', 'icefall-git-sha1': '8b7ca604-clean', 'icefall-git-date': 'Sat Mar 9 14:09:58 2024', 'icefall-path': '/star-home/jinzengrui/lib/miniconda3/envs/dev39/lib/python3.9/site-packages/icefall-1.0-py3.9.egg', 'k2-path': '/star-home/jinzengrui/lib/miniconda3/envs/dev39/lib/python3.9/site-packages/k2-1.24.4.dev20231207+cuda10.2.torch1.11.0-py3.9-linux-x86_64.egg/k2/__init__.py', 'lhotse-path': '/star-home/jinzengrui/lib/miniconda3/envs/dev39/lib/python3.9/site-packages/lhotse-1.22.0.dev0+git.d8ed1bbb.dirty-py3.9.egg/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-2-1207150844-f49d8c4f4-c49d5', 'IP address': '10.177.22.19'}, 'world_size': 4, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 50, 'start_epoch': 31, 'start_batch': 0, 'exp_dir': PosixPath('zipformer/exp'), 'lang_dir': PosixPath('data/lang_char'), 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'context_size': 1, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 1000, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'blank_id': 0, 'vocab_size': 4852} |
|
2024-03-09 17:05:59,869 INFO [train.py:1088] (3/4) About to create model |
|
2024-03-09 17:06:00,577 INFO [train.py:1092] (3/4) Number of model parameters: 74470867 |
|
2024-03-09 17:06:00,578 INFO [checkpoint.py:112] (3/4) Loading checkpoint from zipformer/exp/epoch-30.pt |
|
2024-03-09 17:06:07,828 INFO [train.py:1107] (3/4) Using DDP |
|
2024-03-09 17:06:08,429 INFO [train.py:1119] (3/4) Loading optimizer state dict |
|
2024-03-09 17:06:09,483 INFO [train.py:1127] (3/4) Loading scheduler state dict |
|
2024-03-09 17:06:09,484 INFO [asr_datamodule.py:368] (3/4) About to get train cuts |
|
2024-03-09 17:06:09,530 INFO [asr_datamodule.py:376] (3/4) About to get valid cuts |
|
2024-03-09 17:06:09,532 INFO [asr_datamodule.py:195] (3/4) About to get Musan cuts |
|
2024-03-09 17:06:11,951 INFO [asr_datamodule.py:200] (3/4) Enable MUSAN |
|
2024-03-09 17:06:11,951 INFO [asr_datamodule.py:223] (3/4) Enable SpecAugment |
|
2024-03-09 17:06:11,951 INFO [asr_datamodule.py:224] (3/4) Time warp factor: 80 |
|
2024-03-09 17:06:11,952 INFO [asr_datamodule.py:234] (3/4) Num frame mask: 10 |
|
2024-03-09 17:06:11,952 INFO [asr_datamodule.py:247] (3/4) About to create train dataset |
|
2024-03-09 17:06:11,952 INFO [asr_datamodule.py:273] (3/4) Using DynamicBucketingSampler. |
|
2024-03-09 17:06:12,773 INFO [asr_datamodule.py:290] (3/4) About to create train dataloader |
|
2024-03-09 17:06:12,773 INFO [asr_datamodule.py:315] (3/4) About to create dev dataset |
|
2024-03-09 17:06:13,100 INFO [asr_datamodule.py:332] (3/4) About to create dev dataloader |
|
2024-03-09 17:06:13,100 INFO [train.py:1205] (3/4) Loading grad scaler state dict |
|
2024-03-09 17:06:53,813 INFO [train.py:997] (3/4) Epoch 31, batch 0, loss[loss=0.1304, simple_loss=0.2259, pruned_loss=0.0175, over 22859.00 frames. ], tot_loss[loss=0.1304, simple_loss=0.2259, pruned_loss=0.0175, over 22859.00 frames. ], batch size: 608, lr: 1.41e-02, grad_scale: 64.0 |
|
2024-03-09 17:06:53,813 INFO [train.py:1020] (3/4) Computing validation loss |
|
2024-03-09 17:07:03,243 INFO [train.py:1029] (3/4) Epoch 31, validation: loss=0.2089, simple_loss=0.3019, pruned_loss=0.05794, over 452978.00 frames. |
|
2024-03-09 17:07:03,244 INFO [train.py:1030] (3/4) Maximum memory allocated so far is 24707MB |
|
2024-03-09 17:07:04,460 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.87 vs. limit=15.0 |
|
2024-03-09 17:07:07,758 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.33 vs. limit=15.0 |
|
2024-03-09 17:07:17,162 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.03 vs. limit=15.0 |
|
2024-03-09 17:07:52,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=31800.0, ans=0.125 |
|
2024-03-09 17:07:54,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=31800.0, ans=0.05 |
|
2024-03-09 17:07:58,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=31800.0, ans=0.125 |
|
2024-03-09 17:08:06,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=31866.666666666668, ans=0.125 |
|
2024-03-09 17:08:12,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=31866.666666666668, ans=0.125 |
|
2024-03-09 17:08:17,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=31866.666666666668, ans=0.125 |
|
2024-03-09 17:08:21,814 INFO [train.py:997] (3/4) Epoch 31, batch 50, loss[loss=0.1526, simple_loss=0.2376, pruned_loss=0.03384, over 23894.00 frames. ], tot_loss[loss=0.1462, simple_loss=0.2352, pruned_loss=0.02855, over 1067497.66 frames. ], batch size: 153, lr: 1.41e-02, grad_scale: 64.0 |
|
2024-03-09 17:08:22,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=31933.333333333332, ans=0.1 |
|
2024-03-09 17:08:22,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=31933.333333333332, ans=0.125 |
|
2024-03-09 17:08:25,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=31933.333333333332, ans=0.035 |
|
2024-03-09 17:08:49,153 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.87 vs. limit=12.0 |
|
2024-03-09 17:08:54,736 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.909e+01 7.298e+01 7.941e+01 8.893e+01 1.039e+02, threshold=1.588e+02, percent-clipped=0.0 |
|
2024-03-09 17:09:14,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=32066.666666666668, ans=0.125 |
|
2024-03-09 17:09:20,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=32133.333333333332, ans=0.125 |
|
2024-03-09 17:09:22,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=32133.333333333332, ans=0.2 |
|
2024-03-09 17:09:35,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=32200.0, ans=0.0038695652173913048 |
|
2024-03-09 17:09:48,298 INFO [train.py:997] (3/4) Epoch 31, batch 100, loss[loss=0.1465, simple_loss=0.2404, pruned_loss=0.02631, over 24124.00 frames. ], tot_loss[loss=0.1466, simple_loss=0.2362, pruned_loss=0.02852, over 1889551.81 frames. ], batch size: 326, lr: 1.40e-02, grad_scale: 64.0 |
|
2024-03-09 17:10:07,962 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.69 vs. limit=15.0 |
|
2024-03-09 17:10:57,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=32533.333333333332, ans=0.003797101449275363 |
|
2024-03-09 17:11:07,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=32600.0, ans=0.95 |
|
2024-03-09 17:11:08,727 INFO [train.py:997] (3/4) Epoch 31, batch 150, loss[loss=0.144, simple_loss=0.2385, pruned_loss=0.02476, over 24218.00 frames. ], tot_loss[loss=0.1463, simple_loss=0.236, pruned_loss=0.0283, over 2512128.61 frames. ], batch size: 295, lr: 1.40e-02, grad_scale: 64.0 |
|
2024-03-09 17:12:00,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=32653.333333333332, ans=0.125 |
|
2024-03-09 17:12:06,856 INFO [train.py:997] (3/4) Epoch 32, batch 0, loss[loss=0.1392, simple_loss=0.2259, pruned_loss=0.02618, over 24189.00 frames. ], tot_loss[loss=0.1392, simple_loss=0.2259, pruned_loss=0.02618, over 24189.00 frames. ], batch size: 188, lr: 1.38e-02, grad_scale: 64.0 |
|
2024-03-09 17:12:06,857 INFO [train.py:1020] (3/4) Computing validation loss |
|
2024-03-09 17:12:16,508 INFO [train.py:1029] (3/4) Epoch 32, validation: loss=0.2101, simple_loss=0.3027, pruned_loss=0.0588, over 452978.00 frames. |
|
2024-03-09 17:12:16,509 INFO [train.py:1030] (3/4) Maximum memory allocated so far is 27673MB |
|
2024-03-09 17:12:18,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=32653.333333333332, ans=0.125 |
|
2024-03-09 17:12:20,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=32653.333333333332, ans=0.125 |
|
2024-03-09 17:12:32,055 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.203e+01 7.071e+01 7.685e+01 8.593e+01 1.169e+02, threshold=1.537e+02, percent-clipped=0.0 |
|
2024-03-09 17:12:57,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=32786.666666666664, ans=0.0 |
|
2024-03-09 17:13:04,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=32853.333333333336, ans=0.125 |
|
2024-03-09 17:13:04,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=32853.333333333336, ans=0.125 |
|
2024-03-09 17:13:27,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=32920.0, ans=0.1 |
|
2024-03-09 17:13:30,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=32920.0, ans=0.2 |
|
2024-03-09 17:13:34,611 INFO [train.py:997] (3/4) Epoch 32, batch 50, loss[loss=0.1395, simple_loss=0.2327, pruned_loss=0.02316, over 24223.00 frames. ], tot_loss[loss=0.1444, simple_loss=0.2332, pruned_loss=0.02779, over 1062126.86 frames. ], batch size: 295, lr: 1.38e-02, grad_scale: 64.0 |
|
2024-03-09 17:14:03,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=33053.333333333336, ans=0.02 |
|
2024-03-09 17:14:19,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=33120.0, ans=0.003669565217391305 |
|
2024-03-09 17:14:28,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=33186.666666666664, ans=0.125 |
|
2024-03-09 17:14:59,393 INFO [train.py:997] (3/4) Epoch 32, batch 100, loss[loss=0.1445, simple_loss=0.228, pruned_loss=0.03048, over 23594.00 frames. ], tot_loss[loss=0.1437, simple_loss=0.2328, pruned_loss=0.02729, over 1881462.97 frames. ], batch size: 128, lr: 1.37e-02, grad_scale: 64.0 |
|
2024-03-09 17:15:15,496 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.885e+01 7.174e+01 7.568e+01 8.159e+01 1.038e+02, threshold=1.514e+02, percent-clipped=0.0 |
|
2024-03-09 17:15:16,785 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.84 vs. limit=15.0 |
|
2024-03-09 17:15:24,306 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.99 vs. limit=10.0 |
|
2024-03-09 17:15:29,949 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.29 vs. limit=15.0 |
|
2024-03-09 17:15:32,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=33453.333333333336, ans=0.125 |
|
2024-03-09 17:15:38,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=33453.333333333336, ans=0.125 |
|
2024-03-09 17:15:44,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=33453.333333333336, ans=0.125 |
|
2024-03-09 17:15:55,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=33520.0, ans=0.125 |
|
2024-03-09 17:16:19,711 INFO [train.py:997] (3/4) Epoch 32, batch 150, loss[loss=0.1458, simple_loss=0.236, pruned_loss=0.02785, over 24173.00 frames. ], tot_loss[loss=0.1437, simple_loss=0.2326, pruned_loss=0.02735, over 2518627.96 frames. ], batch size: 217, lr: 1.37e-02, grad_scale: 64.0 |
|
2024-03-09 17:17:08,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=33706.666666666664, ans=0.1 |
|
2024-03-09 17:17:14,939 INFO [train.py:997] (3/4) Epoch 33, batch 0, loss[loss=0.1338, simple_loss=0.2188, pruned_loss=0.02447, over 24247.00 frames. ], tot_loss[loss=0.1338, simple_loss=0.2188, pruned_loss=0.02447, over 24247.00 frames. ], batch size: 217, lr: 1.35e-02, grad_scale: 64.0 |
|
2024-03-09 17:17:14,940 INFO [train.py:1020] (3/4) Computing validation loss |
|
2024-03-09 17:17:24,826 INFO [train.py:1029] (3/4) Epoch 33, validation: loss=0.2104, simple_loss=0.3043, pruned_loss=0.05821, over 452978.00 frames. |
|
2024-03-09 17:17:24,826 INFO [train.py:1030] (3/4) Maximum memory allocated so far is 27673MB |
|
2024-03-09 17:17:30,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=33706.666666666664, ans=0.2 |
|
2024-03-09 17:17:51,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=33773.333333333336, ans=0.125 |
|
2024-03-09 17:17:53,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=33773.333333333336, ans=0.125 |
|
2024-03-09 17:18:02,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=33840.0, ans=0.125 |
|
2024-03-09 17:18:15,959 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.76 vs. limit=15.0 |
|
2024-03-09 17:18:27,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=33973.333333333336, ans=10.0 |
|
2024-03-09 17:18:29,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=33973.333333333336, ans=0.125 |
|
2024-03-09 17:18:40,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=33973.333333333336, ans=0.1 |
|
2024-03-09 17:18:42,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=34040.0, ans=0.0 |
|
2024-03-09 17:18:43,168 INFO [train.py:997] (3/4) Epoch 33, batch 50, loss[loss=0.1443, simple_loss=0.2367, pruned_loss=0.02593, over 24153.00 frames. ], tot_loss[loss=0.1413, simple_loss=0.2307, pruned_loss=0.02598, over 1070736.99 frames. ], batch size: 345, lr: 1.35e-02, grad_scale: 64.0 |
|
2024-03-09 17:18:45,940 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.81 vs. limit=6.0 |
|
2024-03-09 17:18:46,184 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.045e+01 7.058e+01 7.697e+01 8.414e+01 1.529e+02, threshold=1.539e+02, percent-clipped=1.0 |
|
2024-03-09 17:18:49,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=34040.0, ans=0.125 |
|
2024-03-09 17:19:23,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=34173.333333333336, ans=0.125 |
|
2024-03-09 17:19:28,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=34173.333333333336, ans=0.125 |
|
2024-03-09 17:19:53,913 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.61 vs. limit=15.0 |
|
2024-03-09 17:20:08,479 INFO [train.py:997] (3/4) Epoch 33, batch 100, loss[loss=0.14, simple_loss=0.2332, pruned_loss=0.0234, over 24264.00 frames. ], tot_loss[loss=0.1423, simple_loss=0.2313, pruned_loss=0.02668, over 1886315.86 frames. ], batch size: 267, lr: 1.35e-02, grad_scale: 64.0 |
|
2024-03-09 17:20:19,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=34373.333333333336, ans=0.125 |
|
2024-03-09 17:20:21,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=34373.333333333336, ans=0.125 |
|
2024-03-09 17:20:22,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=34440.0, ans=0.1 |
|
2024-03-09 17:20:27,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=34440.0, ans=0.0033826086956521735 |
|
2024-03-09 17:20:59,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=34573.333333333336, ans=0.003353623188405797 |
|
2024-03-09 17:21:28,186 INFO [train.py:997] (3/4) Epoch 33, batch 150, loss[loss=0.1482, simple_loss=0.2403, pruned_loss=0.02803, over 24108.00 frames. ], tot_loss[loss=0.1436, simple_loss=0.2334, pruned_loss=0.02687, over 2524659.63 frames. ], batch size: 366, lr: 1.34e-02, grad_scale: 64.0 |
|
2024-03-09 17:21:31,130 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.628e+01 7.574e+01 8.231e+01 9.009e+01 1.365e+02, threshold=1.646e+02, percent-clipped=0.0 |
|
2024-03-09 17:21:37,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=34706.666666666664, ans=0.0033246376811594206 |
|
2024-03-09 17:22:22,790 INFO [train.py:997] (3/4) Epoch 34, batch 0, loss[loss=0.1447, simple_loss=0.2278, pruned_loss=0.03075, over 24317.00 frames. ], tot_loss[loss=0.1447, simple_loss=0.2278, pruned_loss=0.03075, over 24317.00 frames. ], batch size: 208, lr: 1.32e-02, grad_scale: 64.0 |
|
2024-03-09 17:22:22,791 INFO [train.py:1020] (3/4) Computing validation loss |
|
2024-03-09 17:22:32,283 INFO [train.py:1029] (3/4) Epoch 34, validation: loss=0.2117, simple_loss=0.3053, pruned_loss=0.0591, over 452978.00 frames. |
|
2024-03-09 17:22:32,284 INFO [train.py:1030] (3/4) Maximum memory allocated so far is 27673MB |
|
2024-03-09 17:22:37,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=34760.0, ans=0.125 |
|
2024-03-09 17:22:37,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=34760.0, ans=0.0 |
|
2024-03-09 17:23:03,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=34893.333333333336, ans=0.0 |
|
2024-03-09 17:23:23,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=34960.0, ans=0.125 |
|
2024-03-09 17:23:31,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=34960.0, ans=0.003269565217391304 |
|
2024-03-09 17:23:49,725 INFO [train.py:997] (3/4) Epoch 34, batch 50, loss[loss=0.1437, simple_loss=0.2395, pruned_loss=0.02396, over 23916.00 frames. ], tot_loss[loss=0.1412, simple_loss=0.2305, pruned_loss=0.02589, over 1068620.07 frames. ], batch size: 387, lr: 1.32e-02, grad_scale: 128.0 |
|
2024-03-09 17:24:15,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=35160.0, ans=0.0 |
|
2024-03-09 17:24:17,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=35160.0, ans=0.003226086956521739 |
|
2024-03-09 17:24:21,400 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.82 vs. limit=15.0 |
|
2024-03-09 17:24:26,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=35226.666666666664, ans=0.09899494936611666 |
|
2024-03-09 17:24:31,974 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.96 vs. limit=22.5 |
|
2024-03-09 17:24:37,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=35226.666666666664, ans=0.125 |
|
2024-03-09 17:24:46,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=35293.333333333336, ans=0.1 |
|
2024-03-09 17:24:54,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=35293.333333333336, ans=0.0 |
|
2024-03-09 17:25:04,712 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.790e+01 6.987e+01 7.379e+01 8.041e+01 1.553e+02, threshold=1.476e+02, percent-clipped=0.0 |
|
2024-03-09 17:25:13,959 INFO [train.py:997] (3/4) Epoch 34, batch 100, loss[loss=0.1508, simple_loss=0.2387, pruned_loss=0.03142, over 24214.00 frames. ], tot_loss[loss=0.1421, simple_loss=0.2317, pruned_loss=0.02628, over 1886370.96 frames. ], batch size: 198, lr: 1.32e-02, grad_scale: 128.0 |
|
2024-03-09 17:25:31,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=35493.333333333336, ans=0.125 |
|
2024-03-09 17:25:34,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=35493.333333333336, ans=0.125 |
|
2024-03-09 17:25:44,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=35560.0, ans=0.1 |
|
2024-03-09 17:26:24,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=35693.333333333336, ans=0.125 |
|
2024-03-09 17:26:32,970 INFO [train.py:997] (3/4) Epoch 34, batch 150, loss[loss=0.1418, simple_loss=0.2355, pruned_loss=0.024, over 24099.00 frames. ], tot_loss[loss=0.1434, simple_loss=0.2326, pruned_loss=0.02708, over 2530717.88 frames. ], batch size: 345, lr: 1.32e-02, grad_scale: 128.0 |
|
2024-03-09 17:27:26,418 INFO [train.py:997] (3/4) Epoch 35, batch 0, loss[loss=0.1572, simple_loss=0.2507, pruned_loss=0.03183, over 23655.00 frames. ], tot_loss[loss=0.1572, simple_loss=0.2507, pruned_loss=0.03183, over 23655.00 frames. ], batch size: 485, lr: 1.30e-02, grad_scale: 128.0 |
|
2024-03-09 17:27:26,418 INFO [train.py:1020] (3/4) Computing validation loss |
|
2024-03-09 17:27:38,585 INFO [train.py:1029] (3/4) Epoch 35, validation: loss=0.2098, simple_loss=0.3027, pruned_loss=0.05849, over 452978.00 frames. |
|
2024-03-09 17:27:38,586 INFO [train.py:1030] (3/4) Maximum memory allocated so far is 27673MB |
|
2024-03-09 17:27:56,501 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.71 vs. limit=15.0 |
|
2024-03-09 17:28:08,890 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.73 vs. limit=15.0 |
|
2024-03-09 17:28:15,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=35946.666666666664, ans=0.2 |
|
2024-03-09 17:28:16,328 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.37 vs. limit=15.0 |
|
2024-03-09 17:28:25,599 INFO [scaling.py:1119] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 |
|
2024-03-09 17:28:30,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=36013.333333333336, ans=0.5 |
|
2024-03-09 17:28:34,546 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.276e+01 7.140e+01 7.953e+01 8.912e+01 1.249e+02, threshold=1.591e+02, percent-clipped=0.0 |
|
2024-03-09 17:28:58,519 INFO [train.py:997] (3/4) Epoch 35, batch 50, loss[loss=0.1423, simple_loss=0.2305, pruned_loss=0.02702, over 24238.00 frames. ], tot_loss[loss=0.1403, simple_loss=0.2286, pruned_loss=0.02598, over 1073848.84 frames. ], batch size: 241, lr: 1.30e-02, grad_scale: 128.0 |
|
2024-03-09 17:28:58,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=36146.666666666664, ans=0.125 |
|
2024-03-09 17:29:24,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=36213.333333333336, ans=0.0 |
|
2024-03-09 17:29:26,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=36213.333333333336, ans=0.125 |
|
2024-03-09 17:29:48,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=36346.666666666664, ans=0.125 |
|
2024-03-09 17:30:05,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=36413.333333333336, ans=0.002953623188405796 |
|
2024-03-09 17:30:15,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=36413.333333333336, ans=0.002953623188405796 |
|
2024-03-09 17:30:18,437 INFO [train.py:997] (3/4) Epoch 35, batch 100, loss[loss=0.1327, simple_loss=0.219, pruned_loss=0.02318, over 23649.00 frames. ], tot_loss[loss=0.1402, simple_loss=0.2289, pruned_loss=0.02575, over 1901305.08 frames. ], batch size: 128, lr: 1.29e-02, grad_scale: 128.0 |
|
2024-03-09 17:30:36,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=36546.666666666664, ans=0.035 |
|
2024-03-09 17:30:37,534 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.78 vs. limit=15.0 |
|
2024-03-09 17:30:49,824 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 |
|
2024-03-09 17:31:05,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=36680.0, ans=0.125 |
|
2024-03-09 17:31:18,165 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.800e+01 7.204e+01 7.789e+01 8.601e+01 1.817e+02, threshold=1.558e+02, percent-clipped=1.0 |
|
2024-03-09 17:31:38,613 INFO [train.py:997] (3/4) Epoch 35, batch 150, loss[loss=0.1292, simple_loss=0.2144, pruned_loss=0.02196, over 19900.00 frames. ], tot_loss[loss=0.1413, simple_loss=0.2305, pruned_loss=0.02599, over 2517737.96 frames. ], batch size: 60, lr: 1.29e-02, grad_scale: 64.0 |
|
2024-03-09 17:32:32,816 INFO [train.py:997] (3/4) Epoch 36, batch 0, loss[loss=0.1595, simple_loss=0.2567, pruned_loss=0.03116, over 23607.00 frames. ], tot_loss[loss=0.1595, simple_loss=0.2567, pruned_loss=0.03116, over 23607.00 frames. ], batch size: 486, lr: 1.27e-02, grad_scale: 64.0 |
|
2024-03-09 17:32:32,817 INFO [train.py:1020] (3/4) Computing validation loss |
|
2024-03-09 17:32:40,710 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([0.9559, 2.3706, 2.5634, 2.5911], device='cuda:3') |
|
2024-03-09 17:32:42,864 INFO [train.py:1029] (3/4) Epoch 36, validation: loss=0.212, simple_loss=0.307, pruned_loss=0.05847, over 452978.00 frames. |
|
2024-03-09 17:32:42,864 INFO [train.py:1030] (3/4) Maximum memory allocated so far is 27673MB |
|
2024-03-09 17:32:52,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=36866.666666666664, ans=0.125 |
|
2024-03-09 17:32:54,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=36866.666666666664, ans=0.002855072463768117 |
|
2024-03-09 17:33:03,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=36933.333333333336, ans=0.2 |
|
2024-03-09 17:33:21,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=37000.0, ans=0.1 |
|
2024-03-09 17:33:30,096 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.01 vs. limit=10.0 |
|
2024-03-09 17:33:35,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=37066.666666666664, ans=0.125 |
|
2024-03-09 17:34:10,525 INFO [train.py:997] (3/4) Epoch 36, batch 50, loss[loss=0.1393, simple_loss=0.2257, pruned_loss=0.02643, over 24229.00 frames. ], tot_loss[loss=0.141, simple_loss=0.2306, pruned_loss=0.02566, over 1083429.72 frames. ], batch size: 229, lr: 1.27e-02, grad_scale: 64.0 |
|
2024-03-09 17:34:17,641 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.91 vs. limit=6.0 |
|
2024-03-09 17:34:20,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=37200.0, ans=0.0 |
|
2024-03-09 17:34:23,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=37200.0, ans=0.0027826086956521745 |
|
2024-03-09 17:34:30,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=37266.666666666664, ans=0.125 |
|
2024-03-09 17:34:32,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=37266.666666666664, ans=0.1 |
|
2024-03-09 17:34:37,822 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.18 vs. limit=15.0 |
|
2024-03-09 17:34:42,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=37333.333333333336, ans=0.125 |
|
2024-03-09 17:34:55,597 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.060e+01 6.975e+01 7.752e+01 8.346e+01 1.468e+02, threshold=1.550e+02, percent-clipped=0.0 |
|
2024-03-09 17:35:13,888 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.46 vs. limit=15.0 |
|
2024-03-09 17:35:16,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=37466.666666666664, ans=0.125 |
|
2024-03-09 17:35:16,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=37466.666666666664, ans=0.1 |
|
2024-03-09 17:35:19,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=37466.666666666664, ans=0.0 |
|
2024-03-09 17:35:28,455 INFO [train.py:997] (3/4) Epoch 36, batch 100, loss[loss=0.1411, simple_loss=0.2352, pruned_loss=0.02345, over 24173.00 frames. ], tot_loss[loss=0.1414, simple_loss=0.2317, pruned_loss=0.02551, over 1901434.56 frames. ], batch size: 327, lr: 1.27e-02, grad_scale: 64.0 |
|
2024-03-09 17:35:41,988 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.80 vs. limit=22.5 |
|
2024-03-09 17:35:49,914 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.40 vs. limit=15.0 |
|
2024-03-09 17:35:58,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=37600.0, ans=0.125 |
|
2024-03-09 17:36:17,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=37733.333333333336, ans=0.0 |
|
2024-03-09 17:36:20,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=37733.333333333336, ans=0.0 |
|
2024-03-09 17:36:34,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=37800.0, ans=0.1 |
|
2024-03-09 17:36:36,452 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.70 vs. limit=22.5 |
|
2024-03-09 17:36:40,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=37800.0, ans=0.2 |
|
2024-03-09 17:36:46,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=37800.0, ans=0.5 |
|
2024-03-09 17:36:50,960 INFO [train.py:997] (3/4) Epoch 36, batch 150, loss[loss=0.154, simple_loss=0.2485, pruned_loss=0.02981, over 23956.00 frames. ], tot_loss[loss=0.1407, simple_loss=0.2314, pruned_loss=0.025, over 2528549.44 frames. ], batch size: 416, lr: 1.27e-02, grad_scale: 64.0 |
|
2024-03-09 17:37:46,094 INFO [train.py:997] (3/4) Epoch 37, batch 0, loss[loss=0.1261, simple_loss=0.2107, pruned_loss=0.02078, over 23768.00 frames. ], tot_loss[loss=0.1261, simple_loss=0.2107, pruned_loss=0.02078, over 23768.00 frames. ], batch size: 117, lr: 1.25e-02, grad_scale: 64.0 |
|
2024-03-09 17:37:46,095 INFO [train.py:1020] (3/4) Computing validation loss |
|
2024-03-09 17:37:55,593 INFO [train.py:1029] (3/4) Epoch 37, validation: loss=0.2112, simple_loss=0.3044, pruned_loss=0.05893, over 452978.00 frames. |
|
2024-03-09 17:37:55,594 INFO [train.py:1030] (3/4) Maximum memory allocated so far is 27673MB |
|
2024-03-09 17:37:58,206 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.39 vs. limit=22.5 |
|
2024-03-09 17:38:02,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=37920.0, ans=0.125 |
|
2024-03-09 17:38:21,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=37986.666666666664, ans=0.0026115942028985505 |
|
2024-03-09 17:38:25,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=37986.666666666664, ans=0.0 |
|
2024-03-09 17:38:30,964 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.112e+01 7.137e+01 7.682e+01 8.524e+01 1.300e+02, threshold=1.536e+02, percent-clipped=0.0 |
|
2024-03-09 17:38:35,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=38053.333333333336, ans=0.0 |
|
2024-03-09 17:39:13,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=38186.666666666664, ans=0.125 |
|
2024-03-09 17:39:18,716 INFO [scaling.py:1119] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 |
|
2024-03-09 17:39:20,073 INFO [train.py:997] (3/4) Epoch 37, batch 50, loss[loss=0.1372, simple_loss=0.2284, pruned_loss=0.02299, over 24264.00 frames. ], tot_loss[loss=0.1389, simple_loss=0.2299, pruned_loss=0.02396, over 1063994.42 frames. ], batch size: 254, lr: 1.25e-02, grad_scale: 64.0 |
|
2024-03-09 17:39:25,531 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.23 vs. limit=15.0 |
|
2024-03-09 17:39:27,193 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.11 vs. limit=12.0 |
|
2024-03-09 17:39:40,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=38320.0, ans=0.125 |
|
2024-03-09 17:39:42,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=38320.0, ans=0.125 |
|
2024-03-09 17:39:54,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=38386.666666666664, ans=0.95 |
|
2024-03-09 17:40:03,083 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.73 vs. limit=15.0 |
|
2024-03-09 17:40:16,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=38453.333333333336, ans=0.2 |
|
2024-03-09 17:40:34,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=38520.0, ans=0.125 |
|
2024-03-09 17:40:40,615 INFO [train.py:997] (3/4) Epoch 37, batch 100, loss[loss=0.1576, simple_loss=0.2546, pruned_loss=0.03027, over 23809.00 frames. ], tot_loss[loss=0.1395, simple_loss=0.2304, pruned_loss=0.02426, over 1881976.31 frames. ], batch size: 447, lr: 1.25e-02, grad_scale: 64.0 |
|
2024-03-09 17:41:13,701 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=15.0 |
|
2024-03-09 17:41:15,923 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.868e+01 6.991e+01 7.571e+01 8.226e+01 1.121e+02, threshold=1.514e+02, percent-clipped=0.0 |
|
2024-03-09 17:42:00,691 INFO [train.py:997] (3/4) Epoch 37, batch 150, loss[loss=0.1369, simple_loss=0.231, pruned_loss=0.02144, over 24177.00 frames. ], tot_loss[loss=0.1398, simple_loss=0.2302, pruned_loss=0.02471, over 2517978.69 frames. ], batch size: 345, lr: 1.24e-02, grad_scale: 64.0 |
|
2024-03-09 17:42:52,945 INFO [train.py:997] (3/4) Epoch 38, batch 0, loss[loss=0.14, simple_loss=0.2301, pruned_loss=0.02497, over 24294.00 frames. ], tot_loss[loss=0.14, simple_loss=0.2301, pruned_loss=0.02497, over 24294.00 frames. ], batch size: 281, lr: 1.23e-02, grad_scale: 64.0 |
|
2024-03-09 17:42:52,946 INFO [train.py:1020] (3/4) Computing validation loss |
|
2024-03-09 17:43:02,281 INFO [train.py:1029] (3/4) Epoch 38, validation: loss=0.2136, simple_loss=0.3079, pruned_loss=0.05959, over 452978.00 frames. |
|
2024-03-09 17:43:02,281 INFO [train.py:1030] (3/4) Maximum memory allocated so far is 27673MB |
|
2024-03-09 17:43:13,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=38973.333333333336, ans=0.125 |
|
2024-03-09 17:43:17,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=38973.333333333336, ans=0.0 |
|
2024-03-09 17:43:21,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=39040.0, ans=0.125 |
|
2024-03-09 17:43:21,986 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.71 vs. limit=15.0 |
|
2024-03-09 17:43:39,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=39106.666666666664, ans=0.125 |
|
2024-03-09 17:43:43,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=39106.666666666664, ans=0.2 |
|
2024-03-09 17:44:27,811 INFO [train.py:997] (3/4) Epoch 38, batch 50, loss[loss=0.1516, simple_loss=0.2365, pruned_loss=0.03341, over 24172.00 frames. ], tot_loss[loss=0.1376, simple_loss=0.2267, pruned_loss=0.02422, over 1065371.03 frames. ], batch size: 217, lr: 1.22e-02, grad_scale: 64.0 |
|
2024-03-09 17:44:31,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=39306.666666666664, ans=0.0023246376811594206 |
|
2024-03-09 17:44:46,022 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.38 vs. limit=15.0 |
|
2024-03-09 17:44:48,016 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.028e+01 7.170e+01 7.896e+01 8.779e+01 1.113e+02, threshold=1.579e+02, percent-clipped=0.0 |
|
2024-03-09 17:44:48,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=39373.333333333336, ans=0.125 |
|
2024-03-09 17:44:49,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=39373.333333333336, ans=0.1 |
|
2024-03-09 17:45:10,429 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.88 vs. limit=15.0 |
|
2024-03-09 17:45:18,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=39506.666666666664, ans=0.04949747468305833 |
|
2024-03-09 17:45:34,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=39573.333333333336, ans=0.125 |
|
2024-03-09 17:45:46,175 INFO [train.py:997] (3/4) Epoch 38, batch 100, loss[loss=0.1162, simple_loss=0.2123, pruned_loss=0.01007, over 21384.00 frames. ], tot_loss[loss=0.1405, simple_loss=0.2295, pruned_loss=0.02576, over 1882146.18 frames. ], batch size: 718, lr: 1.22e-02, grad_scale: 64.0 |
|
2024-03-09 17:46:30,015 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.67 vs. limit=22.5 |
|
2024-03-09 17:46:43,686 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.66 vs. limit=15.0 |
|
2024-03-09 17:46:47,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=39840.0, ans=0.0 |
|
2024-03-09 17:47:07,594 INFO [train.py:997] (3/4) Epoch 38, batch 150, loss[loss=0.1462, simple_loss=0.2428, pruned_loss=0.0248, over 23946.00 frames. ], tot_loss[loss=0.1399, simple_loss=0.2299, pruned_loss=0.02494, over 2501198.54 frames. ], batch size: 387, lr: 1.22e-02, grad_scale: 64.0 |
|
2024-03-09 17:48:03,475 INFO [train.py:997] (3/4) Epoch 39, batch 0, loss[loss=0.1398, simple_loss=0.2286, pruned_loss=0.02548, over 24199.00 frames. ], tot_loss[loss=0.1398, simple_loss=0.2286, pruned_loss=0.02548, over 24199.00 frames. ], batch size: 217, lr: 1.20e-02, grad_scale: 64.0 |
|
2024-03-09 17:48:03,476 INFO [train.py:1020] (3/4) Computing validation loss |
|
2024-03-09 17:48:11,687 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.7097, 5.2407, 5.6102, 5.2988], device='cuda:3') |
|
2024-03-09 17:48:12,746 INFO [train.py:1029] (3/4) Epoch 39, validation: loss=0.2141, simple_loss=0.3082, pruned_loss=0.06004, over 452978.00 frames. |
|
2024-03-09 17:48:12,746 INFO [train.py:1030] (3/4) Maximum memory allocated so far is 27673MB |
|
2024-03-09 17:48:26,644 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.993e+01 6.884e+01 7.356e+01 8.157e+01 1.068e+02, threshold=1.471e+02, percent-clipped=0.0 |
|
2024-03-09 17:48:42,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=40093.333333333336, ans=0.125 |
|
2024-03-09 17:49:03,034 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.78 vs. limit=6.0 |
|
2024-03-09 17:49:08,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=40226.666666666664, ans=0.1 |
|
2024-03-09 17:49:09,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=40226.666666666664, ans=0.0 |
|
2024-03-09 17:49:29,147 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.46 vs. limit=15.0 |
|
2024-03-09 17:49:41,667 INFO [train.py:997] (3/4) Epoch 39, batch 50, loss[loss=0.1371, simple_loss=0.2304, pruned_loss=0.02189, over 24247.00 frames. ], tot_loss[loss=0.1366, simple_loss=0.2274, pruned_loss=0.02289, over 1077777.57 frames. ], batch size: 281, lr: 1.20e-02, grad_scale: 64.0 |
|
2024-03-09 17:49:54,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=40360.0, ans=0.125 |
|
2024-03-09 17:50:14,995 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.96 vs. limit=12.0 |
|
2024-03-09 17:50:20,209 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=40493.333333333336, ans=0.035 |
|
2024-03-09 17:50:22,754 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.58 vs. limit=15.0 |
|
2024-03-09 17:50:26,893 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.23 vs. limit=22.5 |
|
2024-03-09 17:50:35,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=40560.0, ans=0.125 |
|
2024-03-09 17:50:59,997 INFO [train.py:997] (3/4) Epoch 39, batch 100, loss[loss=0.1445, simple_loss=0.2396, pruned_loss=0.02469, over 24066.00 frames. ], tot_loss[loss=0.1401, simple_loss=0.2304, pruned_loss=0.02493, over 1883949.65 frames. ], batch size: 365, lr: 1.20e-02, grad_scale: 64.0 |
|
2024-03-09 17:51:00,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=40693.333333333336, ans=0.125 |
|
2024-03-09 17:51:08,194 INFO [scaling.py:1119] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 |
|
2024-03-09 17:51:09,406 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.940e+01 6.841e+01 7.461e+01 8.103e+01 1.250e+02, threshold=1.492e+02, percent-clipped=0.0 |
|
2024-03-09 17:51:20,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=40760.0, ans=0.0 |
|
2024-03-09 17:51:27,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=40760.0, ans=0.0020086956521739134 |
|
2024-03-09 17:52:18,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=40960.0, ans=0.2 |
|
2024-03-09 17:52:21,037 INFO [train.py:997] (3/4) Epoch 39, batch 150, loss[loss=0.1416, simple_loss=0.2332, pruned_loss=0.02503, over 24144.00 frames. ], tot_loss[loss=0.1388, simple_loss=0.2294, pruned_loss=0.02411, over 2518309.01 frames. ], batch size: 345, lr: 1.20e-02, grad_scale: 64.0 |
|
2024-03-09 17:52:27,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=41026.666666666664, ans=0.0 |
|
2024-03-09 17:53:16,194 INFO [train.py:997] (3/4) Epoch 40, batch 0, loss[loss=0.1332, simple_loss=0.2279, pruned_loss=0.01927, over 24126.00 frames. ], tot_loss[loss=0.1332, simple_loss=0.2279, pruned_loss=0.01927, over 24126.00 frames. ], batch size: 366, lr: 1.18e-02, grad_scale: 64.0 |
|
2024-03-09 17:53:16,194 INFO [train.py:1020] (3/4) Computing validation loss |
|
2024-03-09 17:53:25,708 INFO [train.py:1029] (3/4) Epoch 40, validation: loss=0.2148, simple_loss=0.3085, pruned_loss=0.06058, over 452978.00 frames. |
|
2024-03-09 17:53:25,709 INFO [train.py:1030] (3/4) Maximum memory allocated so far is 27673MB |
|
2024-03-09 17:53:59,429 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.98 vs. limit=10.0 |
|
2024-03-09 17:54:05,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=41213.333333333336, ans=0.125 |
|
2024-03-09 17:54:07,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=41213.333333333336, ans=0.1 |
|
2024-03-09 17:54:12,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=41213.333333333336, ans=0.125 |
|
2024-03-09 17:54:20,635 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.20 vs. limit=15.0 |
|
2024-03-09 17:54:47,012 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.979e+01 7.013e+01 7.603e+01 8.055e+01 1.247e+02, threshold=1.521e+02, percent-clipped=0.0 |
|
2024-03-09 17:54:51,547 INFO [train.py:997] (3/4) Epoch 40, batch 50, loss[loss=0.1389, simple_loss=0.2284, pruned_loss=0.02473, over 24190.00 frames. ], tot_loss[loss=0.1375, simple_loss=0.2284, pruned_loss=0.02328, over 1068897.50 frames. ], batch size: 280, lr: 1.18e-02, grad_scale: 64.0 |
|
2024-03-09 17:54:58,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=41413.333333333336, ans=0.04949747468305833 |
|
2024-03-09 17:55:10,278 INFO [scaling.py:1119] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 |
|
2024-03-09 17:55:31,194 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.17 vs. limit=15.0 |
|
2024-03-09 17:55:34,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=41546.666666666664, ans=0.025 |
|
2024-03-09 17:55:58,723 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.44 vs. limit=15.0 |
|
2024-03-09 17:56:11,517 INFO [train.py:997] (3/4) Epoch 40, batch 100, loss[loss=0.1397, simple_loss=0.2378, pruned_loss=0.02077, over 24135.00 frames. ], tot_loss[loss=0.137, simple_loss=0.2274, pruned_loss=0.02327, over 1889769.59 frames. ], batch size: 366, lr: 1.18e-02, grad_scale: 64.0 |
|
2024-03-09 17:56:17,034 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.60 vs. limit=15.0 |
|
2024-03-09 17:56:23,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=41746.666666666664, ans=0.2 |
|
2024-03-09 17:56:37,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=41813.333333333336, ans=0.0017797101449275356 |
|
2024-03-09 17:56:47,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=41880.0, ans=0.0017652173913043478 |
|
2024-03-09 17:57:15,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=42013.333333333336, ans=0.2 |
|
2024-03-09 17:57:18,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=42013.333333333336, ans=0.2 |
|
2024-03-09 17:57:25,919 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.807e+01 6.999e+01 7.479e+01 8.341e+01 1.133e+02, threshold=1.496e+02, percent-clipped=0.0 |
|
2024-03-09 17:57:29,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=42080.0, ans=0.025 |
|
2024-03-09 17:57:30,894 INFO [train.py:997] (3/4) Epoch 40, batch 150, loss[loss=0.128, simple_loss=0.2208, pruned_loss=0.01763, over 22982.00 frames. ], tot_loss[loss=0.137, simple_loss=0.2273, pruned_loss=0.02336, over 2516987.52 frames. ], batch size: 609, lr: 1.18e-02, grad_scale: 64.0 |
|
2024-03-09 17:57:32,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=42080.0, ans=0.0 |
|
2024-03-09 17:57:34,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=42080.0, ans=0.1 |
|
2024-03-09 17:58:21,372 INFO [train.py:997] (3/4) Epoch 41, batch 0, loss[loss=0.131, simple_loss=0.2185, pruned_loss=0.02179, over 24135.00 frames. ], tot_loss[loss=0.131, simple_loss=0.2185, pruned_loss=0.02179, over 24135.00 frames. ], batch size: 240, lr: 1.16e-02, grad_scale: 64.0 |
|
2024-03-09 17:58:21,372 INFO [train.py:1020] (3/4) Computing validation loss |
|
2024-03-09 17:58:30,940 INFO [train.py:1029] (3/4) Epoch 41, validation: loss=0.2136, simple_loss=0.3076, pruned_loss=0.05982, over 452978.00 frames. |
|
2024-03-09 17:58:30,941 INFO [train.py:1030] (3/4) Maximum memory allocated so far is 27673MB |
|
2024-03-09 17:58:47,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=42200.0, ans=0.125 |
|
2024-03-09 17:58:52,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=42200.0, ans=0.0016956521739130443 |
|
2024-03-09 17:59:24,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=42333.333333333336, ans=0.125 |
|
2024-03-09 17:59:41,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=42400.0, ans=0.2 |
|
2024-03-09 17:59:53,550 INFO [train.py:997] (3/4) Epoch 41, batch 50, loss[loss=0.1266, simple_loss=0.2204, pruned_loss=0.01633, over 24085.00 frames. ], tot_loss[loss=0.1351, simple_loss=0.2258, pruned_loss=0.02225, over 1067454.82 frames. ], batch size: 344, lr: 1.16e-02, grad_scale: 64.0 |
|
2024-03-09 18:00:19,292 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.45 vs. limit=15.0 |
|
2024-03-09 18:00:37,651 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.37 vs. limit=15.0 |
|
2024-03-09 18:00:38,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=42600.0, ans=0.125 |
|
2024-03-09 18:00:55,611 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.788e+01 7.025e+01 7.943e+01 8.921e+01 1.202e+02, threshold=1.589e+02, percent-clipped=0.0 |
|
2024-03-09 18:01:00,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=42733.333333333336, ans=0.001579710144927535 |
|
2024-03-09 18:01:05,928 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.09 vs. limit=15.0 |
|
2024-03-09 18:01:11,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=42733.333333333336, ans=0.0 |
|
2024-03-09 18:01:14,024 INFO [train.py:997] (3/4) Epoch 41, batch 100, loss[loss=0.1442, simple_loss=0.2327, pruned_loss=0.02788, over 23047.00 frames. ], tot_loss[loss=0.1369, simple_loss=0.2276, pruned_loss=0.02308, over 1884862.24 frames. ], batch size: 102, lr: 1.16e-02, grad_scale: 64.0 |
|
2024-03-09 18:01:33,418 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.97 vs. limit=22.5 |
|
2024-03-09 18:01:38,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=42866.666666666664, ans=0.125 |
|
2024-03-09 18:01:39,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten.whitening_limit, batch_count=42866.666666666664, ans=22.5 |
|
2024-03-09 18:01:55,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=42933.333333333336, ans=0.125 |
|
2024-03-09 18:02:03,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=43000.0, ans=0.125 |
|
2024-03-09 18:02:05,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=43000.0, ans=0.125 |
|
2024-03-09 18:02:11,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=43000.0, ans=0.125 |
|
2024-03-09 18:02:34,745 INFO [train.py:997] (3/4) Epoch 41, batch 150, loss[loss=0.1369, simple_loss=0.2339, pruned_loss=0.01992, over 23932.00 frames. ], tot_loss[loss=0.1372, simple_loss=0.228, pruned_loss=0.02321, over 2527203.76 frames. ], batch size: 387, lr: 1.16e-02, grad_scale: 64.0 |
|
2024-03-09 18:02:35,558 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 |
|
2024-03-09 18:02:37,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=43133.333333333336, ans=0.2 |
|
2024-03-09 18:02:39,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=43133.333333333336, ans=0.0014927536231884048 |
|
2024-03-09 18:03:28,793 INFO [train.py:997] (3/4) Epoch 42, batch 0, loss[loss=0.1431, simple_loss=0.2415, pruned_loss=0.02238, over 23966.00 frames. ], tot_loss[loss=0.1431, simple_loss=0.2415, pruned_loss=0.02238, over 23966.00 frames. ], batch size: 416, lr: 1.14e-02, grad_scale: 64.0 |
|
2024-03-09 18:03:28,793 INFO [train.py:1020] (3/4) Computing validation loss |
|
2024-03-09 18:03:38,340 INFO [train.py:1029] (3/4) Epoch 42, validation: loss=0.2135, simple_loss=0.3075, pruned_loss=0.05972, over 452978.00 frames. |
|
2024-03-09 18:03:38,341 INFO [train.py:1030] (3/4) Maximum memory allocated so far is 27673MB |
|
2024-03-09 18:04:03,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=43253.333333333336, ans=0.0 |
|
2024-03-09 18:04:04,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=43253.333333333336, ans=0.0014666666666666665 |
|
2024-03-09 18:04:07,114 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.65 vs. limit=12.0 |
|
2024-03-09 18:04:09,198 INFO [scaling.py:1119] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 |
|
2024-03-09 18:04:12,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=43320.0, ans=0.0 |
|
2024-03-09 18:04:12,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=43320.0, ans=0.0 |
|
2024-03-09 18:04:29,006 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.865e+01 6.812e+01 7.244e+01 8.018e+01 1.063e+02, threshold=1.449e+02, percent-clipped=0.0 |
|
2024-03-09 18:04:50,271 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.55 vs. limit=15.0 |
|
2024-03-09 18:04:58,767 INFO [train.py:997] (3/4) Epoch 42, batch 50, loss[loss=0.1467, simple_loss=0.2284, pruned_loss=0.03251, over 23906.00 frames. ], tot_loss[loss=0.1341, simple_loss=0.2251, pruned_loss=0.02157, over 1069473.35 frames. ], batch size: 153, lr: 1.14e-02, grad_scale: 64.0 |
|
2024-03-09 18:05:06,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=43520.0, ans=0.125 |
|
2024-03-09 18:05:19,865 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.17 vs. limit=15.0 |
|
2024-03-09 18:05:49,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=43720.0, ans=0.125 |
|
2024-03-09 18:05:57,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=43720.0, ans=0.035 |
|
2024-03-09 18:06:20,948 INFO [train.py:997] (3/4) Epoch 42, batch 100, loss[loss=0.1437, simple_loss=0.2436, pruned_loss=0.02187, over 23829.00 frames. ], tot_loss[loss=0.1342, simple_loss=0.225, pruned_loss=0.02169, over 1881314.16 frames. ], batch size: 447, lr: 1.14e-02, grad_scale: 64.0 |
|
2024-03-09 18:06:33,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=43853.333333333336, ans=0.1 |
|
2024-03-09 18:07:07,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=44053.333333333336, ans=0.001292753623188406 |
|
2024-03-09 18:07:09,737 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.750e+01 6.712e+01 7.266e+01 7.977e+01 1.080e+02, threshold=1.453e+02, percent-clipped=0.0 |
|
2024-03-09 18:07:10,771 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.30 vs. limit=6.0 |
|
2024-03-09 18:07:24,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=44120.0, ans=0.2 |
|
2024-03-09 18:07:39,991 INFO [train.py:997] (3/4) Epoch 42, batch 150, loss[loss=0.1351, simple_loss=0.2215, pruned_loss=0.02432, over 20004.00 frames. ], tot_loss[loss=0.1355, simple_loss=0.2269, pruned_loss=0.0221, over 2516694.39 frames. ], batch size: 60, lr: 1.14e-02, grad_scale: 64.0 |
|
2024-03-09 18:07:40,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=44186.666666666664, ans=0.2 |
|
2024-03-09 18:08:31,601 INFO [train.py:997] (3/4) Epoch 43, batch 0, loss[loss=0.1485, simple_loss=0.2454, pruned_loss=0.02575, over 23705.00 frames. ], tot_loss[loss=0.1485, simple_loss=0.2454, pruned_loss=0.02575, over 23705.00 frames. ], batch size: 485, lr: 1.12e-02, grad_scale: 64.0 |
|
2024-03-09 18:08:31,602 INFO [train.py:1020] (3/4) Computing validation loss |
|
2024-03-09 18:08:41,004 INFO [train.py:1029] (3/4) Epoch 43, validation: loss=0.2134, simple_loss=0.3077, pruned_loss=0.05952, over 452978.00 frames. |
|
2024-03-09 18:08:41,005 INFO [train.py:1030] (3/4) Maximum memory allocated so far is 27673MB |
|
2024-03-09 18:08:53,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=44240.0, ans=0.125 |
|
2024-03-09 18:09:01,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=44306.666666666664, ans=0.0 |
|
2024-03-09 18:09:50,359 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.16 vs. limit=15.0 |
|
2024-03-09 18:09:50,990 INFO [scaling.py:1119] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 |
|
2024-03-09 18:09:58,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=44506.666666666664, ans=0.2 |
|
2024-03-09 18:10:01,088 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.93 vs. limit=15.0 |
|
2024-03-09 18:10:01,381 INFO [train.py:997] (3/4) Epoch 43, batch 50, loss[loss=0.1285, simple_loss=0.2119, pruned_loss=0.02257, over 20430.00 frames. ], tot_loss[loss=0.1379, simple_loss=0.2283, pruned_loss=0.02375, over 1072487.93 frames. ], batch size: 62, lr: 1.12e-02, grad_scale: 64.0 |
|
2024-03-09 18:10:36,516 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.916e+01 6.864e+01 7.263e+01 8.155e+01 1.054e+02, threshold=1.453e+02, percent-clipped=0.0 |
|
2024-03-09 18:10:40,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=44706.666666666664, ans=0.125 |
|
2024-03-09 18:10:46,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=44773.333333333336, ans=0.125 |
|
2024-03-09 18:11:19,221 INFO [train.py:997] (3/4) Epoch 43, batch 100, loss[loss=0.1368, simple_loss=0.2324, pruned_loss=0.02057, over 24162.00 frames. ], tot_loss[loss=0.1356, simple_loss=0.2261, pruned_loss=0.02253, over 1889264.58 frames. ], batch size: 345, lr: 1.12e-02, grad_scale: 64.0 |
|
2024-03-09 18:12:10,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=45106.666666666664, ans=0.125 |
|
2024-03-09 18:12:33,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=45173.333333333336, ans=0.0 |
|
2024-03-09 18:12:34,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=45173.333333333336, ans=0.0 |
|
2024-03-09 18:12:40,863 INFO [train.py:997] (3/4) Epoch 43, batch 150, loss[loss=0.113, simple_loss=0.2075, pruned_loss=0.009225, over 21451.00 frames. ], tot_loss[loss=0.1355, simple_loss=0.2267, pruned_loss=0.02216, over 2516550.55 frames. ], batch size: 718, lr: 1.12e-02, grad_scale: 32.0 |
|
2024-03-09 18:12:49,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=45240.0, ans=0.125 |
|
2024-03-09 18:13:36,399 INFO [train.py:997] (3/4) Epoch 44, batch 0, loss[loss=0.1242, simple_loss=0.2121, pruned_loss=0.0181, over 23603.00 frames. ], tot_loss[loss=0.1242, simple_loss=0.2121, pruned_loss=0.0181, over 23603.00 frames. ], batch size: 128, lr: 1.10e-02, grad_scale: 32.0 |
|
2024-03-09 18:13:36,399 INFO [train.py:1020] (3/4) Computing validation loss |
|
2024-03-09 18:13:45,433 INFO [train.py:1029] (3/4) Epoch 44, validation: loss=0.2121, simple_loss=0.3064, pruned_loss=0.05891, over 452978.00 frames. |
|
2024-03-09 18:13:45,434 INFO [train.py:1030] (3/4) Maximum memory allocated so far is 27673MB |
|
2024-03-09 18:14:02,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=45293.333333333336, ans=0.125 |
|
2024-03-09 18:14:06,614 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.85 vs. limit=12.0 |
|
2024-03-09 18:14:07,943 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=2.92 vs. limit=12.0 |
|
2024-03-09 18:14:19,825 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.880e+01 6.918e+01 7.525e+01 8.097e+01 1.200e+02, threshold=1.505e+02, percent-clipped=0.0 |
|
2024-03-09 18:15:05,780 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.19 vs. limit=15.0 |
|
2024-03-09 18:15:09,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=45560.0, ans=0.2 |
|
2024-03-09 18:15:12,599 INFO [train.py:997] (3/4) Epoch 44, batch 50, loss[loss=0.1368, simple_loss=0.2234, pruned_loss=0.02505, over 24053.00 frames. ], tot_loss[loss=0.1364, simple_loss=0.2266, pruned_loss=0.02309, over 1070767.10 frames. ], batch size: 165, lr: 1.10e-02, grad_scale: 32.0 |
|
2024-03-09 18:15:15,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=45626.666666666664, ans=0.125 |
|
2024-03-09 18:15:22,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=45626.666666666664, ans=0.125 |
|
2024-03-09 18:15:48,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=45760.0, ans=0.125 |
|
2024-03-09 18:15:52,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=45760.0, ans=0.125 |
|
2024-03-09 18:15:58,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=45826.666666666664, ans=0.2 |
|
2024-03-09 18:16:20,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=45893.333333333336, ans=0.125 |
|
2024-03-09 18:16:30,683 INFO [train.py:997] (3/4) Epoch 44, batch 100, loss[loss=0.1365, simple_loss=0.2287, pruned_loss=0.02215, over 24256.00 frames. ], tot_loss[loss=0.1377, simple_loss=0.2287, pruned_loss=0.0234, over 1887612.04 frames. ], batch size: 281, lr: 1.10e-02, grad_scale: 16.0 |
|
2024-03-09 18:16:34,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=45960.0, ans=0.95 |
|
2024-03-09 18:16:45,383 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.21 vs. limit=6.0 |
|
2024-03-09 18:16:49,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=46026.666666666664, ans=0.125 |
|
2024-03-09 18:16:56,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=46026.666666666664, ans=0.0008637681159420294 |
|
2024-03-09 18:17:01,035 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.671e+01 6.824e+01 7.356e+01 8.103e+01 1.148e+02, threshold=1.471e+02, percent-clipped=0.0 |
|
2024-03-09 18:17:09,518 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.68 vs. limit=22.5 |
|
2024-03-09 18:17:22,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=46160.0, ans=0.04949747468305833 |
|
2024-03-09 18:17:36,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=46226.666666666664, ans=0.0008202898550724643 |
|
2024-03-09 18:17:46,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=46226.666666666664, ans=0.0 |
|
2024-03-09 18:17:51,976 INFO [train.py:997] (3/4) Epoch 44, batch 150, loss[loss=0.1265, simple_loss=0.2256, pruned_loss=0.01369, over 24226.00 frames. ], tot_loss[loss=0.1366, simple_loss=0.2276, pruned_loss=0.02283, over 2517182.93 frames. ], batch size: 327, lr: 1.10e-02, grad_scale: 16.0 |
|
2024-03-09 18:18:43,508 INFO [train.py:997] (3/4) Epoch 45, batch 0, loss[loss=0.1409, simple_loss=0.2269, pruned_loss=0.02746, over 24064.00 frames. ], tot_loss[loss=0.1409, simple_loss=0.2269, pruned_loss=0.02746, over 24064.00 frames. ], batch size: 165, lr: 1.09e-02, grad_scale: 32.0 |
|
2024-03-09 18:18:43,509 INFO [train.py:1020] (3/4) Computing validation loss |
|
2024-03-09 18:18:53,093 INFO [train.py:1029] (3/4) Epoch 45, validation: loss=0.2137, simple_loss=0.3089, pruned_loss=0.05927, over 452978.00 frames. |
|
2024-03-09 18:18:53,094 INFO [train.py:1030] (3/4) Maximum memory allocated so far is 27673MB |
|
2024-03-09 18:19:05,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=46346.666666666664, ans=0.1 |
|
2024-03-09 18:19:38,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=46480.0, ans=0.125 |
|
2024-03-09 18:19:39,048 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.54 vs. limit=15.0 |
|
2024-03-09 18:19:45,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=46546.666666666664, ans=0.1 |
|
2024-03-09 18:19:51,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=46546.666666666664, ans=0.125 |
|
2024-03-09 18:19:55,489 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.38 vs. limit=15.0 |
|
2024-03-09 18:20:07,000 INFO [scaling.py:1119] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 |
|
2024-03-09 18:20:11,480 INFO [scaling.py:1119] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 |
|
2024-03-09 18:20:16,257 INFO [train.py:997] (3/4) Epoch 45, batch 50, loss[loss=0.1298, simple_loss=0.2171, pruned_loss=0.02127, over 24318.00 frames. ], tot_loss[loss=0.135, simple_loss=0.2262, pruned_loss=0.02193, over 1073449.42 frames. ], batch size: 208, lr: 1.08e-02, grad_scale: 32.0 |
|
2024-03-09 18:20:22,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=46680.0, ans=0.1 |
|
2024-03-09 18:20:29,932 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.843e+01 6.817e+01 7.386e+01 8.152e+01 1.203e+02, threshold=1.477e+02, percent-clipped=0.0 |
|
2024-03-09 18:20:39,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=46746.666666666664, ans=0.5 |
|
2024-03-09 18:20:43,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=46746.666666666664, ans=0.05 |
|
2024-03-09 18:20:47,466 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.88 vs. limit=10.0 |
|
2024-03-09 18:20:51,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=46813.333333333336, ans=0.125 |
|
2024-03-09 18:20:51,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=46813.333333333336, ans=0.1 |
|
2024-03-09 18:21:25,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=46946.666666666664, ans=0.0006637681159420306 |
|
2024-03-09 18:21:35,473 INFO [train.py:997] (3/4) Epoch 45, batch 100, loss[loss=0.1357, simple_loss=0.2272, pruned_loss=0.0221, over 24266.00 frames. ], tot_loss[loss=0.135, simple_loss=0.226, pruned_loss=0.02198, over 1890623.48 frames. ], batch size: 311, lr: 1.08e-02, grad_scale: 32.0 |
|
2024-03-09 18:22:22,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=47213.333333333336, ans=0.2 |
|
2024-03-09 18:22:22,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=47213.333333333336, ans=0.0 |
|
2024-03-09 18:22:41,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=47280.0, ans=0.1 |
|
2024-03-09 18:22:45,466 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.92 vs. limit=15.0 |
|
2024-03-09 18:22:55,744 INFO [train.py:997] (3/4) Epoch 45, batch 150, loss[loss=0.139, simple_loss=0.2273, pruned_loss=0.02528, over 24223.00 frames. ], tot_loss[loss=0.1347, simple_loss=0.2259, pruned_loss=0.02176, over 2515683.97 frames. ], batch size: 229, lr: 1.08e-02, grad_scale: 16.0 |
|
2024-03-09 18:22:59,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=47346.666666666664, ans=0.2 |
|
2024-03-09 18:23:50,625 INFO [train.py:997] (3/4) Epoch 46, batch 0, loss[loss=0.1476, simple_loss=0.2425, pruned_loss=0.02637, over 23723.00 frames. ], tot_loss[loss=0.1476, simple_loss=0.2425, pruned_loss=0.02637, over 23723.00 frames. ], batch size: 486, lr: 1.07e-02, grad_scale: 16.0 |
|
2024-03-09 18:23:50,626 INFO [train.py:1020] (3/4) Computing validation loss |
|
2024-03-09 18:24:00,487 INFO [train.py:1029] (3/4) Epoch 46, validation: loss=0.2142, simple_loss=0.3085, pruned_loss=0.05997, over 452978.00 frames. |
|
2024-03-09 18:24:00,488 INFO [train.py:1030] (3/4) Maximum memory allocated so far is 27673MB |
|
2024-03-09 18:24:05,180 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.866e+01 6.849e+01 7.495e+01 7.996e+01 1.078e+02, threshold=1.499e+02, percent-clipped=0.0 |
|
2024-03-09 18:24:06,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=47400.0, ans=0.125 |
|
2024-03-09 18:24:14,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=47400.0, ans=0.2 |
|
2024-03-09 18:24:16,593 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.58 vs. limit=15.0 |
|
2024-03-09 18:24:20,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=47466.666666666664, ans=0.2 |
|
2024-03-09 18:24:26,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=47466.666666666664, ans=0.0005507246376811603 |
|
2024-03-09 18:24:27,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=47466.666666666664, ans=0.1 |
|
2024-03-09 18:24:28,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=47466.666666666664, ans=15.0 |
|
2024-03-09 18:24:37,977 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.34 vs. limit=15.0 |
|
2024-03-09 18:24:41,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=47533.333333333336, ans=0.0005362318840579708 |
|
2024-03-09 18:24:58,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=47600.0, ans=0.125 |
|
2024-03-09 18:25:25,827 INFO [train.py:997] (3/4) Epoch 46, batch 50, loss[loss=0.1312, simple_loss=0.2258, pruned_loss=0.01824, over 24210.00 frames. ], tot_loss[loss=0.1321, simple_loss=0.2231, pruned_loss=0.02053, over 1071835.54 frames. ], batch size: 295, lr: 1.07e-02, grad_scale: 16.0 |
|
2024-03-09 18:26:45,329 INFO [train.py:997] (3/4) Epoch 46, batch 100, loss[loss=0.1194, simple_loss=0.2152, pruned_loss=0.01181, over 22860.00 frames. ], tot_loss[loss=0.1329, simple_loss=0.2239, pruned_loss=0.02092, over 1888353.71 frames. ], batch size: 608, lr: 1.06e-02, grad_scale: 16.0 |
|
2024-03-09 18:26:49,977 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.653e+01 6.627e+01 7.164e+01 7.678e+01 1.012e+02, threshold=1.433e+02, percent-clipped=0.0 |
|
2024-03-09 18:27:05,755 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.46 vs. limit=15.0 |
|
2024-03-09 18:27:55,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=48333.333333333336, ans=0.1 |
|
2024-03-09 18:28:06,136 INFO [train.py:997] (3/4) Epoch 46, batch 150, loss[loss=0.1635, simple_loss=0.2528, pruned_loss=0.0371, over 23209.00 frames. ], tot_loss[loss=0.1342, simple_loss=0.2259, pruned_loss=0.02125, over 2526246.99 frames. ], batch size: 534, lr: 1.06e-02, grad_scale: 16.0 |
|
2024-03-09 18:28:08,672 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.24 vs. limit=15.0 |
|
2024-03-09 18:29:00,564 INFO [train.py:997] (3/4) Epoch 47, batch 0, loss[loss=0.134, simple_loss=0.2278, pruned_loss=0.02012, over 24208.00 frames. ], tot_loss[loss=0.134, simple_loss=0.2278, pruned_loss=0.02012, over 24208.00 frames. ], batch size: 295, lr: 1.05e-02, grad_scale: 32.0 |
|
2024-03-09 18:29:00,565 INFO [train.py:1020] (3/4) Computing validation loss |
|
2024-03-09 18:29:10,389 INFO [train.py:1029] (3/4) Epoch 47, validation: loss=0.2152, simple_loss=0.3095, pruned_loss=0.06041, over 452978.00 frames. |
|
2024-03-09 18:29:10,390 INFO [train.py:1030] (3/4) Maximum memory allocated so far is 27673MB |
|
2024-03-09 18:29:11,567 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.05 vs. limit=22.5 |
|
2024-03-09 18:29:15,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=48453.333333333336, ans=0.125 |
|
2024-03-09 18:29:19,097 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.59 vs. limit=10.0 |
|
2024-03-09 18:29:42,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=48586.666666666664, ans=0.1 |
|
2024-03-09 18:30:28,053 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.832e+01 6.822e+01 7.253e+01 7.989e+01 1.051e+02, threshold=1.451e+02, percent-clipped=0.0 |
|
2024-03-09 18:30:34,258 INFO [train.py:997] (3/4) Epoch 47, batch 50, loss[loss=0.1498, simple_loss=0.2312, pruned_loss=0.0342, over 23977.00 frames. ], tot_loss[loss=0.1308, simple_loss=0.2213, pruned_loss=0.02016, over 1069116.76 frames. ], batch size: 153, lr: 1.05e-02, grad_scale: 16.0 |
|
2024-03-09 18:30:41,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=48786.666666666664, ans=0.125 |
|
2024-03-09 18:31:08,154 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.63 vs. limit=15.0 |
|
2024-03-09 18:31:09,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=48920.0, ans=0.00023478260869565226 |
|
2024-03-09 18:31:12,984 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.91 vs. limit=6.0 |
|
2024-03-09 18:31:25,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=48986.666666666664, ans=0.125 |
|
2024-03-09 18:31:28,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=48986.666666666664, ans=0.2 |
|
2024-03-09 18:31:28,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=48986.666666666664, ans=0.0 |
|
2024-03-09 18:31:44,546 INFO [scaling.py:1119] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 |
|
2024-03-09 18:31:53,479 INFO [train.py:997] (3/4) Epoch 47, batch 100, loss[loss=0.1403, simple_loss=0.2377, pruned_loss=0.02142, over 23945.00 frames. ], tot_loss[loss=0.134, simple_loss=0.225, pruned_loss=0.02154, over 1881478.66 frames. ], batch size: 387, lr: 1.05e-02, grad_scale: 8.0 |
|
2024-03-09 18:32:02,423 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.81 vs. limit=15.0 |
|
2024-03-09 18:32:05,335 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.83 vs. limit=10.0 |
|
2024-03-09 18:32:43,260 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=49320.0, ans=0.125 |
|
2024-03-09 18:32:48,835 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.55 vs. limit=15.0 |
|
2024-03-09 18:32:54,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=49320.0, ans=0.0 |
|
2024-03-09 18:32:58,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=49386.666666666664, ans=0.0 |
|
2024-03-09 18:33:07,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=49386.666666666664, ans=0.0 |
|
2024-03-09 18:33:10,428 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.939e+01 7.123e+01 7.707e+01 8.583e+01 1.160e+02, threshold=1.541e+02, percent-clipped=0.0 |
|
2024-03-09 18:33:15,538 INFO [train.py:997] (3/4) Epoch 47, batch 150, loss[loss=0.1354, simple_loss=0.2301, pruned_loss=0.02033, over 24039.00 frames. ], tot_loss[loss=0.1345, simple_loss=0.2259, pruned_loss=0.02161, over 2511654.63 frames. ], batch size: 344, lr: 1.05e-02, grad_scale: 8.0 |
|
2024-03-09 18:33:15,814 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=49453.333333333336, ans=0.1 |
|
2024-03-09 18:34:03,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=49506.666666666664, ans=0.125 |
|
2024-03-09 18:34:05,686 INFO [train.py:997] (3/4) Epoch 48, batch 0, loss[loss=0.118, simple_loss=0.2089, pruned_loss=0.01354, over 23939.00 frames. ], tot_loss[loss=0.118, simple_loss=0.2089, pruned_loss=0.01354, over 23939.00 frames. ], batch size: 142, lr: 1.03e-02, grad_scale: 16.0 |
|
2024-03-09 18:34:05,687 INFO [train.py:1020] (3/4) Computing validation loss |
|
2024-03-09 18:34:15,169 INFO [train.py:1029] (3/4) Epoch 48, validation: loss=0.2149, simple_loss=0.3083, pruned_loss=0.06081, over 452978.00 frames. |
|
2024-03-09 18:34:15,170 INFO [train.py:1030] (3/4) Maximum memory allocated so far is 27673MB |
|
2024-03-09 18:34:33,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=49573.333333333336, ans=0.0 |
|
2024-03-09 18:34:44,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=49573.333333333336, ans=0.1 |
|
2024-03-09 18:34:45,546 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.30 vs. limit=10.0 |
|
2024-03-09 18:35:04,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=49640.0, ans=0.125 |
|
2024-03-09 18:35:24,460 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.74 vs. limit=12.0 |
|
2024-03-09 18:35:39,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=49840.0, ans=0.125 |
|
2024-03-09 18:35:40,447 INFO [train.py:997] (3/4) Epoch 48, batch 50, loss[loss=0.1303, simple_loss=0.2189, pruned_loss=0.02089, over 24226.00 frames. ], tot_loss[loss=0.1329, simple_loss=0.2236, pruned_loss=0.02108, over 1074716.06 frames. ], batch size: 217, lr: 1.03e-02, grad_scale: 16.0 |
|
2024-03-09 18:35:41,581 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.62 vs. limit=15.0 |
|
2024-03-09 18:35:53,152 INFO [scaling.py:1119] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 |
|
2024-03-09 18:36:03,370 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.57 vs. limit=15.0 |
|
2024-03-09 18:36:11,064 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.01 vs. limit=10.0 |
|
2024-03-09 18:36:16,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=49973.333333333336, ans=0.125 |
|
2024-03-09 18:36:22,286 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.84 vs. limit=15.0 |
|
2024-03-09 18:36:32,171 INFO [scaling.py:1119] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 |
|
2024-03-09 18:36:39,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=50040.0, ans=0.1 |
|
2024-03-09 18:36:39,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=50040.0, ans=0.125 |
|
2024-03-09 18:36:42,443 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.770e+01 6.729e+01 7.301e+01 8.005e+01 9.735e+01, threshold=1.460e+02, percent-clipped=0.0 |
|
2024-03-09 18:36:47,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=50106.666666666664, ans=0.2 |
|
2024-03-09 18:36:59,143 INFO [train.py:997] (3/4) Epoch 48, batch 100, loss[loss=0.1205, simple_loss=0.2024, pruned_loss=0.01929, over 23722.00 frames. ], tot_loss[loss=0.1341, simple_loss=0.2248, pruned_loss=0.02172, over 1889845.79 frames. ], batch size: 117, lr: 1.03e-02, grad_scale: 16.0 |
|
2024-03-09 18:37:25,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=50240.0, ans=0.07 |
|
2024-03-09 18:37:34,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=50306.666666666664, ans=0.2 |
|
2024-03-09 18:37:50,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=50373.333333333336, ans=0.0 |
|
2024-03-09 18:38:02,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=50373.333333333336, ans=0.0 |
|
2024-03-09 18:38:02,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=50373.333333333336, ans=0.125 |
|
2024-03-09 18:38:20,105 INFO [train.py:997] (3/4) Epoch 48, batch 150, loss[loss=0.1282, simple_loss=0.2205, pruned_loss=0.0179, over 24263.00 frames. ], tot_loss[loss=0.1335, simple_loss=0.2247, pruned_loss=0.02112, over 2507628.60 frames. ], batch size: 241, lr: 1.03e-02, grad_scale: 8.0 |
|
2024-03-09 18:38:29,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=50506.666666666664, ans=0.09899494936611666 |
|
2024-03-09 18:38:29,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=50506.666666666664, ans=0.0 |
|
2024-03-09 18:39:15,061 INFO [train.py:997] (3/4) Epoch 49, batch 0, loss[loss=0.1372, simple_loss=0.2326, pruned_loss=0.02094, over 23746.00 frames. ], tot_loss[loss=0.1372, simple_loss=0.2326, pruned_loss=0.02094, over 23746.00 frames. ], batch size: 486, lr: 1.02e-02, grad_scale: 16.0 |
|
2024-03-09 18:39:15,062 INFO [train.py:1020] (3/4) Computing validation loss |
|
2024-03-09 18:39:24,778 INFO [train.py:1029] (3/4) Epoch 49, validation: loss=0.2171, simple_loss=0.31, pruned_loss=0.06203, over 452978.00 frames. |
|
2024-03-09 18:39:24,779 INFO [train.py:1030] (3/4) Maximum memory allocated so far is 27673MB |
|
2024-03-09 18:39:45,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=50626.666666666664, ans=0.125 |
|
2024-03-09 18:39:51,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=50626.666666666664, ans=0.0 |
|
2024-03-09 18:40:11,811 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.90 vs. limit=15.0 |
|
2024-03-09 18:40:12,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=50693.333333333336, ans=0.2 |
|
2024-03-09 18:40:23,493 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.861e+01 6.914e+01 7.599e+01 8.430e+01 1.205e+02, threshold=1.520e+02, percent-clipped=0.0 |
|
2024-03-09 18:40:37,111 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.78 vs. limit=15.0 |
|
2024-03-09 18:40:47,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=50826.666666666664, ans=0.125 |
|
2024-03-09 18:40:51,381 INFO [train.py:997] (3/4) Epoch 49, batch 50, loss[loss=0.1153, simple_loss=0.2104, pruned_loss=0.01008, over 21415.00 frames. ], tot_loss[loss=0.1328, simple_loss=0.2238, pruned_loss=0.02091, over 1064043.14 frames. ], batch size: 718, lr: 1.02e-02, grad_scale: 16.0 |
|
2024-03-09 18:41:22,009 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.91 vs. limit=15.0 |
|
2024-03-09 18:41:40,434 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.77 vs. limit=12.0 |
|
2024-03-09 18:41:51,339 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.20 vs. limit=15.0 |
|
2024-03-09 18:42:10,662 INFO [train.py:997] (3/4) Epoch 49, batch 100, loss[loss=0.1366, simple_loss=0.2256, pruned_loss=0.02384, over 24195.00 frames. ], tot_loss[loss=0.1321, simple_loss=0.2229, pruned_loss=0.02069, over 1879066.52 frames. ], batch size: 188, lr: 1.01e-02, grad_scale: 8.0 |
|
2024-03-09 18:42:12,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=51226.666666666664, ans=0.125 |
|
2024-03-09 18:42:18,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=51226.666666666664, ans=0.05 |
|
2024-03-09 18:42:18,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=51226.666666666664, ans=0.125 |
|
2024-03-09 18:42:25,546 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.80 vs. limit=22.5 |
|
2024-03-09 18:42:27,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=51293.333333333336, ans=0.1 |
|
2024-03-09 18:42:41,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=51293.333333333336, ans=0.0 |
|
2024-03-09 18:43:04,162 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.035e+01 6.804e+01 7.380e+01 7.884e+01 1.078e+02, threshold=1.476e+02, percent-clipped=0.0 |
|
2024-03-09 18:43:17,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=51493.333333333336, ans=0.0 |
|
2024-03-09 18:43:30,600 INFO [train.py:997] (3/4) Epoch 49, batch 150, loss[loss=0.1279, simple_loss=0.2197, pruned_loss=0.0181, over 24245.00 frames. ], tot_loss[loss=0.1335, simple_loss=0.2243, pruned_loss=0.02132, over 2507386.41 frames. ], batch size: 198, lr: 1.01e-02, grad_scale: 8.0 |
|
2024-03-09 18:43:39,347 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.74 vs. limit=6.0 |
|
2024-03-09 18:44:22,357 INFO [train.py:997] (3/4) Epoch 50, batch 0, loss[loss=0.134, simple_loss=0.2338, pruned_loss=0.0171, over 23883.00 frames. ], tot_loss[loss=0.134, simple_loss=0.2338, pruned_loss=0.0171, over 23883.00 frames. ], batch size: 447, lr: 1.00e-02, grad_scale: 16.0 |
|
2024-03-09 18:44:22,357 INFO [train.py:1020] (3/4) Computing validation loss |
|
2024-03-09 18:44:30,928 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([1.2012, 3.7216, 3.9667, 2.6843], device='cuda:3') |
|
2024-03-09 18:44:31,920 INFO [train.py:1029] (3/4) Epoch 50, validation: loss=0.2164, simple_loss=0.3113, pruned_loss=0.06071, over 452978.00 frames. |
|
2024-03-09 18:44:31,920 INFO [train.py:1030] (3/4) Maximum memory allocated so far is 27673MB |
|
2024-03-09 18:44:32,930 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.00 vs. limit=22.5 |
|
2024-03-09 18:44:42,405 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.02 vs. limit=15.0 |
|
2024-03-09 18:45:29,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=51813.333333333336, ans=0.09899494936611666 |
|
2024-03-09 18:45:40,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=51880.0, ans=0.0 |
|
2024-03-09 18:45:40,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=51880.0, ans=0.0 |
|
2024-03-09 18:45:42,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=51880.0, ans=0.125 |
|
2024-03-09 18:45:57,123 INFO [train.py:997] (3/4) Epoch 50, batch 50, loss[loss=0.1173, simple_loss=0.2122, pruned_loss=0.01116, over 22836.00 frames. ], tot_loss[loss=0.131, simple_loss=0.2224, pruned_loss=0.01983, over 1065670.29 frames. ], batch size: 609, lr: 1.00e-02, grad_scale: 8.0 |
|
2024-03-09 18:46:16,626 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.96 vs. limit=10.0 |
|
2024-03-09 18:46:37,167 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.908e+01 6.888e+01 7.222e+01 7.907e+01 1.090e+02, threshold=1.444e+02, percent-clipped=0.0 |
|
2024-03-09 18:47:13,805 INFO [train.py:997] (3/4) Epoch 50, batch 100, loss[loss=0.1329, simple_loss=0.2276, pruned_loss=0.01912, over 24203.00 frames. ], tot_loss[loss=0.132, simple_loss=0.2232, pruned_loss=0.02042, over 1867371.10 frames. ], batch size: 295, lr: 9.99e-03, grad_scale: 8.0 |
|
2024-03-09 18:47:19,505 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=6.83 vs. limit=15.0 |
|
2024-03-09 18:47:22,821 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.78 vs. limit=15.0 |
|
2024-03-09 18:47:56,121 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.89 vs. limit=10.0 |
|
2024-03-09 18:48:27,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=52546.666666666664, ans=0.0 |
|
2024-03-09 18:48:27,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=52546.666666666664, ans=0.0 |
|
2024-03-09 18:48:28,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=52546.666666666664, ans=0.0 |
|
2024-03-09 18:48:36,459 INFO [train.py:997] (3/4) Epoch 50, batch 150, loss[loss=0.1306, simple_loss=0.2205, pruned_loss=0.02031, over 24169.00 frames. ], tot_loss[loss=0.1331, simple_loss=0.2249, pruned_loss=0.02064, over 2500487.36 frames. ], batch size: 217, lr: 9.97e-03, grad_scale: 8.0 |
|
2024-03-09 18:48:48,483 INFO [train.py:1248] (3/4) Done! |
|
|