Zengwei's picture
upload files
c23ef24
raw
history blame contribute delete
No virus
158 kB
2023-05-10 14:59:59,102 INFO [train.py:1091] (0/2) Training started
2023-05-10 14:59:59,105 INFO [train.py:1101] (0/2) Device: cuda:0
2023-05-10 14:59:59,109 INFO [train.py:1110] (0/2) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.23.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '7efe024b23078ffa0bcb5598afff14f356edae7c', 'k2-git-date': 'Mon Jan 30 20:22:57 2023', 'lhotse-version': '1.12.0.dev+git.891bad1.clean', 'torch-version': '1.10.0+cu102', 'torch-cuda-available': True, 'torch-cuda-version': '10.2', 'python-version': '3.8', 'icefall-git-branch': 'from_dan_scaled_adam_exp1119', 'icefall-git-sha1': '432b2fa3-dirty', 'icefall-git-date': 'Mon May 8 18:46:45 2023', 'icefall-path': '/ceph-zw/workspace/zipformer/icefall_dan_streaming', 'k2-path': '/ceph-zw/workspace/k2/k2/k2/python/k2/__init__.py', 'lhotse-path': '/ceph-zw/workspace/share/lhotse/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-6-0423201309-7c68fd68fb-6cszs', 'IP address': '10.177.28.83'}, 'world_size': 2, 'master_port': 12348, 'tensorboard': True, 'num_epochs': 40, 'start_epoch': 31, 'start_batch': 0, 'exp_dir': PosixPath('pruned_transducer_stateless7/exp1119-smaller-md1500'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.04, 'lr_batches': 7500, 'lr_epochs': 3.5, 'lr_warmup_start': 0.5, 'ref_duration': 600, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'num_encoder_layers': '2,2,2,2,2,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,768,768,768,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,256,256,256,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,192,192,192,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'full_libri': True, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 1500, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'blank_id': 0, 'vocab_size': 500}
2023-05-10 14:59:59,110 INFO [train.py:1112] (0/2) About to create model
2023-05-10 14:59:59,603 INFO [train.py:1116] (0/2) Number of model parameters: 23285615
2023-05-10 15:00:00,113 INFO [checkpoint.py:112] (0/2) Loading checkpoint from pruned_transducer_stateless7/exp1119-smaller-md1500/epoch-30.pt
2023-05-10 15:00:00,849 INFO [checkpoint.py:131] (0/2) Loading averaged model
2023-05-10 15:00:06,502 INFO [train.py:1131] (0/2) Using DDP
2023-05-10 15:00:06,746 INFO [train.py:1145] (0/2) Loading optimizer state dict
2023-05-10 15:00:06,999 INFO [train.py:1153] (0/2) Loading scheduler state dict
2023-05-10 15:00:07,000 INFO [asr_datamodule.py:409] (0/2) About to get train-clean-100 cuts
2023-05-10 15:00:07,017 INFO [asr_datamodule.py:416] (0/2) About to get train-clean-360 cuts
2023-05-10 15:00:07,019 INFO [asr_datamodule.py:423] (0/2) About to get train-other-500 cuts
2023-05-10 15:00:07,020 INFO [asr_datamodule.py:225] (0/2) Enable MUSAN
2023-05-10 15:00:07,020 INFO [asr_datamodule.py:226] (0/2) About to get Musan cuts
2023-05-10 15:00:09,418 INFO [asr_datamodule.py:254] (0/2) Enable SpecAugment
2023-05-10 15:00:09,418 INFO [asr_datamodule.py:255] (0/2) Time warp factor: 80
2023-05-10 15:00:09,418 INFO [asr_datamodule.py:267] (0/2) Num frame mask: 10
2023-05-10 15:00:09,418 INFO [asr_datamodule.py:280] (0/2) About to create train dataset
2023-05-10 15:00:09,419 INFO [asr_datamodule.py:309] (0/2) Using DynamicBucketingSampler.
2023-05-10 15:00:14,241 WARNING [train.py:1182] (0/2) Exclude cut with ID 7859-102521-0017-7548-0_sp1.1 from training. Duration: 22.2954375
2023-05-10 15:00:15,890 INFO [asr_datamodule.py:324] (0/2) About to create train dataloader
2023-05-10 15:00:15,891 INFO [asr_datamodule.py:430] (0/2) About to get dev-clean cuts
2023-05-10 15:00:15,894 INFO [asr_datamodule.py:437] (0/2) About to get dev-other cuts
2023-05-10 15:00:15,895 INFO [asr_datamodule.py:355] (0/2) About to create dev dataset
2023-05-10 15:00:16,233 INFO [asr_datamodule.py:374] (0/2) About to create dev dataloader
2023-05-10 15:00:16,234 INFO [train.py:1329] (0/2) Sanity check -- see if any of the batches in epoch 1 would cause OOM.
2023-05-10 15:00:21,433 WARNING [train.py:1182] (0/2) Exclude cut with ID 7859-102521-0017-7548-0_sp1.1 from training. Duration: 22.2954375
2023-05-10 15:00:28,399 WARNING [train.py:1182] (0/2) Exclude cut with ID 7859-102521-0017-7548-0_sp1.1 from training. Duration: 22.2954375
2023-05-10 15:00:33,358 WARNING [train.py:1182] (0/2) Exclude cut with ID 298-126791-0067-24026-0_sp0.9 from training. Duration: 21.438875
2023-05-10 15:00:33,610 WARNING [train.py:1182] (0/2) Exclude cut with ID 5652-39938-0025-23684-0_sp0.9 from training. Duration: 22.2055625
2023-05-10 15:00:43,481 WARNING [train.py:1182] (0/2) Exclude cut with ID 7859-102521-0017-7548-0 from training. Duration: 24.525
2023-05-10 15:00:45,106 WARNING [train.py:1182] (0/2) Exclude cut with ID 3699-47246-0007-3408-0_sp0.9 from training. Duration: 20.26675
2023-05-10 15:00:45,763 WARNING [train.py:1182] (0/2) Exclude cut with ID 7859-102521-0017-7548-0_sp0.9 from training. Duration: 27.25
2023-05-10 15:00:49,961 WARNING [train.py:1182] (0/2) Exclude cut with ID 6426-64292-0017-15984-0 from training. Duration: 21.68
2023-05-10 15:00:50,542 WARNING [train.py:1182] (0/2) Exclude cut with ID 4278-13270-0007-59342-0 from training. Duration: 21.6300625
2023-05-10 15:00:51,688 WARNING [train.py:1182] (0/2) Exclude cut with ID 4278-13270-0007-59342-0_sp0.9 from training. Duration: 24.033375
2023-05-10 15:00:54,504 WARNING [train.py:1182] (0/2) Exclude cut with ID 4278-13270-0009-59344-0 from training. Duration: 22.905
2023-05-10 15:00:54,558 WARNING [train.py:1182] (0/2) Exclude cut with ID 5622-44585-0006-90525-0_sp1.1 from training. Duration: 23.4318125
2023-05-10 15:01:00,811 WARNING [train.py:1182] (0/2) Exclude cut with ID 4278-13270-0009-59344-0_sp1.1 from training. Duration: 20.82275
2023-05-10 15:01:01,478 WARNING [train.py:1182] (0/2) Exclude cut with ID 4278-13270-0009-59344-0_sp0.9 from training. Duration: 25.45
2023-05-10 15:01:04,395 WARNING [train.py:1182] (0/2) Exclude cut with ID 5622-44585-0006-90525-0 from training. Duration: 25.775
2023-05-10 15:01:05,455 WARNING [train.py:1182] (0/2) Exclude cut with ID 2929-85685-0071-62375-0_sp0.9 from training. Duration: 22.25
2023-05-10 15:01:06,776 WARNING [train.py:1182] (0/2) Exclude cut with ID 3972-170212-0014-23379-0 from training. Duration: 26.205
2023-05-10 15:01:08,196 WARNING [train.py:1182] (0/2) Exclude cut with ID 5239-32139-0047-9341-0_sp0.9 from training. Duration: 30.1555625
2023-05-10 15:01:08,465 WARNING [train.py:1182] (0/2) Exclude cut with ID 1265-135635-0050-6781-0_sp0.9 from training. Duration: 21.8333125
2023-05-10 15:01:08,896 WARNING [train.py:1182] (0/2) Exclude cut with ID 1914-133440-0024-94914-0_sp1.1 from training. Duration: 20.6545625
2023-05-10 15:01:10,934 WARNING [train.py:1182] (0/2) Exclude cut with ID 7395-89880-0045-39920-0_sp0.9 from training. Duration: 20.52225
2023-05-10 15:01:11,944 WARNING [train.py:1182] (0/2) Exclude cut with ID 3972-170212-0014-23379-0_sp0.9 from training. Duration: 29.1166875
2023-05-10 15:01:15,386 WARNING [train.py:1182] (0/2) Exclude cut with ID 543-133211-0007-59831-0_sp0.9 from training. Duration: 21.388875
2023-05-10 15:01:17,131 WARNING [train.py:1182] (0/2) Exclude cut with ID 1914-133440-0024-94914-0 from training. Duration: 22.72
2023-05-10 15:01:17,195 WARNING [train.py:1182] (0/2) Exclude cut with ID 1914-133440-0031-94921-0_sp0.9 from training. Duration: 22.7444375
2023-05-10 15:01:19,329 WARNING [train.py:1182] (0/2) Exclude cut with ID 4133-6541-0027-40495-0_sp1.1 from training. Duration: 0.9681875
2023-05-10 15:01:19,510 WARNING [train.py:1182] (0/2) Exclude cut with ID 6330-62851-0022-91297-0_sp0.9 from training. Duration: 22.3166875
2023-05-10 15:01:20,376 WARNING [train.py:1182] (0/2) Exclude cut with ID 543-133212-0015-59917-0_sp0.9 from training. Duration: 21.8166875
2023-05-10 15:01:25,389 WARNING [train.py:1182] (0/2) Exclude cut with ID 4957-30119-0041-23990-0_sp0.9 from training. Duration: 20.22775
2023-05-10 15:01:28,129 WARNING [train.py:1182] (0/2) Exclude cut with ID 5239-32139-0047-9341-0_sp1.1 from training. Duration: 24.67275
2023-05-10 15:01:29,520 WARNING [train.py:1182] (0/2) Exclude cut with ID 3082-165428-0081-50734-0_sp0.9 from training. Duration: 21.8055625
2023-05-10 15:01:31,332 WARNING [train.py:1182] (0/2) Exclude cut with ID 3340-169293-0054-76830-0_sp0.9 from training. Duration: 22.6666875
2023-05-10 15:01:34,966 WARNING [train.py:1182] (0/2) Exclude cut with ID 2411-132532-0017-82279-0_sp1.1 from training. Duration: 0.9681875
2023-05-10 15:01:36,258 WARNING [train.py:1182] (0/2) Exclude cut with ID 6330-62850-0007-91323-0 from training. Duration: 22.485
2023-05-10 15:01:38,102 WARNING [train.py:1182] (0/2) Exclude cut with ID 3972-170212-0014-23379-0_sp1.1 from training. Duration: 23.82275
2023-05-10 15:01:39,343 WARNING [train.py:1182] (0/2) Exclude cut with ID 4860-13185-0032-76709-0 from training. Duration: 20.77
2023-05-10 15:01:39,711 WARNING [train.py:1182] (0/2) Exclude cut with ID 6426-64292-0017-15984-0_sp0.9 from training. Duration: 24.088875
2023-05-10 15:01:41,145 WARNING [train.py:1182] (0/2) Exclude cut with ID 6330-62850-0007-91323-0_sp1.1 from training. Duration: 20.4409375
2023-05-10 15:01:45,520 WARNING [train.py:1182] (0/2) Exclude cut with ID 2929-85685-0062-62366-0_sp0.9 from training. Duration: 22.511125
2023-05-10 15:01:45,553 WARNING [train.py:1182] (0/2) Exclude cut with ID 7395-89880-0031-39906-0 from training. Duration: 20.675
2023-05-10 15:01:50,394 WARNING [train.py:1182] (0/2) Exclude cut with ID 6330-62850-0007-91323-0_sp0.9 from training. Duration: 24.9833125
2023-05-10 15:01:52,573 WARNING [train.py:1182] (0/2) Exclude cut with ID 5239-32139-0047-9341-0 from training. Duration: 27.14
2023-05-10 15:01:53,333 WARNING [train.py:1182] (0/2) Exclude cut with ID 2929-85685-0044-62348-0 from training. Duration: 22.44
2023-05-10 15:01:57,170 WARNING [train.py:1182] (0/2) Exclude cut with ID 2929-85685-0060-62364-0_sp0.9 from training. Duration: 21.361125
2023-05-10 15:01:58,078 WARNING [train.py:1182] (0/2) Exclude cut with ID 2929-85685-0079-62383-0_sp1.1 from training. Duration: 27.0318125
2023-05-10 15:01:58,554 WARNING [train.py:1182] (0/2) Exclude cut with ID 5622-44585-0006-90525-0_sp0.9 from training. Duration: 28.638875
2023-05-10 15:01:59,324 WARNING [train.py:1182] (0/2) Exclude cut with ID 3340-169293-0054-76830-0 from training. Duration: 20.4
2023-05-10 15:02:00,845 WARNING [train.py:1182] (0/2) Exclude cut with ID 2929-85685-0071-62375-0 from training. Duration: 20.025
2023-05-10 15:02:00,857 WARNING [train.py:1182] (0/2) Exclude cut with ID 2364-131735-0112-64612-0_sp0.9 from training. Duration: 20.488875
2023-05-10 15:02:01,130 WARNING [train.py:1182] (0/2) Exclude cut with ID 2929-85685-0079-62383-0 from training. Duration: 29.735
2023-05-10 15:02:05,742 WARNING [train.py:1182] (0/2) Exclude cut with ID 7276-92427-0014-12983-0_sp0.9 from training. Duration: 21.3055625
2023-05-10 15:02:05,812 WARNING [train.py:1182] (0/2) Exclude cut with ID 1025-75365-0008-79168-0_sp0.9 from training. Duration: 22.0666875
2023-05-10 15:02:11,581 WARNING [train.py:1182] (0/2) Exclude cut with ID 2929-85685-0062-62366-0 from training. Duration: 20.26
2023-05-10 15:02:12,260 WARNING [train.py:1182] (0/2) Exclude cut with ID 5239-32139-0030-9324-0_sp0.9 from training. Duration: 21.3444375
2023-05-10 15:02:15,035 WARNING [train.py:1182] (0/2) Exclude cut with ID 497-129325-0061-62254-0_sp1.1 from training. Duration: 0.97725
2023-05-10 15:02:17,712 WARNING [train.py:1182] (0/2) Exclude cut with ID 7395-89880-0031-39906-0_sp0.9 from training. Duration: 22.97225
2023-05-10 15:02:19,248 WARNING [train.py:1182] (0/2) Exclude cut with ID 7395-89880-0047-39922-0_sp0.9 from training. Duration: 21.97775
2023-05-10 15:02:19,915 WARNING [train.py:1182] (0/2) Exclude cut with ID 1112-1043-0006-89194-0_sp0.9 from training. Duration: 21.8333125
2023-05-10 15:02:20,433 WARNING [train.py:1182] (0/2) Exclude cut with ID 1914-133440-0031-94921-0 from training. Duration: 20.47
2023-05-10 15:02:24,228 WARNING [train.py:1182] (0/2) Exclude cut with ID 7395-89880-0037-39912-0_sp0.9 from training. Duration: 20.67225
2023-05-10 15:02:25,153 WARNING [train.py:1182] (0/2) Exclude cut with ID 1914-133440-0024-94914-0_sp0.9 from training. Duration: 25.2444375
2023-05-10 15:02:26,335 WARNING [train.py:1182] (0/2) Exclude cut with ID 3340-169293-0021-76797-0_sp0.9 from training. Duration: 21.1445
2023-05-10 15:02:30,598 WARNING [train.py:1182] (0/2) Exclude cut with ID 2929-85685-0079-62383-0_sp0.9 from training. Duration: 33.038875
2023-05-10 15:02:32,475 WARNING [train.py:1182] (0/2) Exclude cut with ID 6426-64291-0000-16059-0_sp0.9 from training. Duration: 20.0944375
2023-05-10 15:02:33,216 WARNING [train.py:1182] (0/2) Exclude cut with ID 2929-85685-0044-62348-0_sp1.1 from training. Duration: 20.4
2023-05-10 15:02:33,604 WARNING [train.py:1182] (0/2) Exclude cut with ID 6330-62851-0022-91297-0 from training. Duration: 20.085
2023-05-10 15:02:34,128 WARNING [train.py:1182] (0/2) Exclude cut with ID 4860-13185-0032-76709-0_sp0.9 from training. Duration: 23.07775
2023-05-10 15:02:37,023 WARNING [train.py:1182] (0/2) Exclude cut with ID 2929-85685-0044-62348-0_sp0.9 from training. Duration: 24.9333125
2023-05-10 15:02:39,144 WARNING [train.py:1182] (0/2) Exclude cut with ID 5118-111612-0016-124680-0_sp0.9 from training. Duration: 20.388875
2023-05-10 15:02:39,461 WARNING [train.py:1182] (0/2) Exclude cut with ID 432-122774-0017-62487-0_sp1.1 from training. Duration: 20.3590625
2023-05-10 15:02:43,331 WARNING [train.py:1182] (0/2) Exclude cut with ID 3557-8342-0013-54691-0_sp1.1 from training. Duration: 0.836375
2023-05-10 15:02:45,372 WARNING [train.py:1182] (0/2) Exclude cut with ID 8565-290391-0049-67394-0_sp0.9 from training. Duration: 21.3166875
2023-05-10 15:02:46,831 WARNING [train.py:1182] (0/2) Exclude cut with ID 6533-399-0029-104863-0_sp0.9 from training. Duration: 22.1055625
2023-05-10 15:02:47,962 WARNING [train.py:1182] (0/2) Exclude cut with ID 7699-105389-0094-26379-0_sp1.1 from training. Duration: 21.77725
2023-05-10 15:02:48,918 WARNING [train.py:1182] (0/2) Exclude cut with ID 7255-291500-0005-134304-0_sp0.9 from training. Duration: 27.8166875
2023-05-10 15:02:50,089 WARNING [train.py:1182] (0/2) Exclude cut with ID 7357-94126-0021-15852-0_sp1.1 from training. Duration: 22.5090625
2023-05-10 15:02:50,384 WARNING [train.py:1182] (0/2) Exclude cut with ID 7255-291500-0005-134304-0 from training. Duration: 25.035
2023-05-10 15:02:51,220 WARNING [train.py:1182] (0/2) Exclude cut with ID 774-127930-0014-10412-0_sp1.1 from training. Duration: 0.95
2023-05-10 15:02:52,106 WARNING [train.py:1182] (0/2) Exclude cut with ID 3033-130750-0096-55598-0_sp0.9 from training. Duration: 0.92225
2023-05-10 15:02:53,934 WARNING [train.py:1182] (0/2) Exclude cut with ID 4511-76322-0006-80011-0 from training. Duration: 21.97
2023-05-10 15:02:54,838 WARNING [train.py:1182] (0/2) Exclude cut with ID 7492-105653-0055-62765-0_sp0.9 from training. Duration: 21.97225
2023-05-10 15:02:54,872 WARNING [train.py:1182] (0/2) Exclude cut with ID 453-131332-0000-47844-0_sp0.9 from training. Duration: 25.3333125
2023-05-10 15:02:55,349 WARNING [train.py:1182] (0/2) Exclude cut with ID 5172-29468-0015-19128-0_sp0.9 from training. Duration: 21.5055625
2023-05-10 15:02:55,788 WARNING [train.py:1182] (0/2) Exclude cut with ID 453-131332-0000-47844-0_sp1.1 from training. Duration: 20.72725
2023-05-10 15:02:57,380 WARNING [train.py:1182] (0/2) Exclude cut with ID 8631-249866-0030-130156-0_sp0.9 from training. Duration: 26.32775
2023-05-10 15:02:58,871 WARNING [train.py:1182] (0/2) Exclude cut with ID 3867-173237-0077-144769-0 from training. Duration: 20.025
2023-05-10 15:02:59,708 WARNING [train.py:1182] (0/2) Exclude cut with ID 6709-74022-0004-86860-0_sp1.1 from training. Duration: 0.9409375
2023-05-10 15:02:59,717 WARNING [train.py:1182] (0/2) Exclude cut with ID 4757-1811-0023-62229-0_sp0.9 from training. Duration: 21.37775
2023-05-10 15:03:00,718 WARNING [train.py:1182] (0/2) Exclude cut with ID 1250-135782-0004-25974-0_sp0.9 from training. Duration: 21.17225
2023-05-10 15:03:00,728 WARNING [train.py:1182] (0/2) Exclude cut with ID 7357-94126-0021-15852-0_sp0.9 from training. Duration: 27.511125
2023-05-10 15:03:02,286 WARNING [train.py:1182] (0/2) Exclude cut with ID 453-131332-0000-47844-0 from training. Duration: 22.8
2023-05-10 15:03:02,509 WARNING [train.py:1182] (0/2) Exclude cut with ID 4964-30587-0040-44509-0 from training. Duration: 22.585
2023-05-10 15:03:03,962 WARNING [train.py:1182] (0/2) Exclude cut with ID 6978-92210-0001-146967-0_sp0.9 from training. Duration: 22.0166875
2023-05-10 15:03:04,707 WARNING [train.py:1182] (0/2) Exclude cut with ID 7255-291500-0003-134302-0_sp1.1 from training. Duration: 24.395375
2023-05-10 15:03:05,657 WARNING [train.py:1182] (0/2) Exclude cut with ID 3488-85273-0017-41203-0_sp0.9 from training. Duration: 27.47775
2023-05-10 15:03:05,862 WARNING [train.py:1182] (0/2) Exclude cut with ID 432-122774-0017-62487-0_sp0.9 from training. Duration: 24.8833125
2023-05-10 15:03:06,001 WARNING [train.py:1182] (0/2) Exclude cut with ID 6758-72288-0033-108368-0 from training. Duration: 23.39
2023-05-10 15:03:06,304 WARNING [train.py:1182] (0/2) Exclude cut with ID 8291-282929-0007-12994-0_sp0.9 from training. Duration: 28.72225
2023-05-10 15:03:06,779 WARNING [train.py:1182] (0/2) Exclude cut with ID 585-294811-0110-133686-0_sp0.9 from training. Duration: 20.8944375
2023-05-10 15:03:07,482 WARNING [train.py:1182] (0/2) Exclude cut with ID 5796-66357-0007-116447-0_sp0.9 from training. Duration: 23.8444375
2023-05-10 15:03:08,679 WARNING [train.py:1182] (0/2) Exclude cut with ID 8291-282929-0007-12994-0 from training. Duration: 25.85
2023-05-10 15:03:08,688 WARNING [train.py:1182] (0/2) Exclude cut with ID 8291-282929-0023-13010-0 from training. Duration: 21.39
2023-05-10 15:03:09,199 WARNING [train.py:1182] (0/2) Exclude cut with ID 7357-94126-0014-15845-0 from training. Duration: 27.92
2023-05-10 15:03:10,523 WARNING [train.py:1182] (0/2) Exclude cut with ID 8631-249866-0039-130165-0_sp0.9 from training. Duration: 20.661125
2023-05-10 15:03:12,254 WARNING [train.py:1182] (0/2) Exclude cut with ID 7357-94126-0043-15874-0_sp0.9 from training. Duration: 20.07225
2023-05-10 15:03:12,647 WARNING [train.py:1182] (0/2) Exclude cut with ID 1085-156170-0017-128270-0 from training. Duration: 21.01
2023-05-10 15:03:15,913 WARNING [train.py:1182] (0/2) Exclude cut with ID 2195-150901-0045-59933-0 from training. Duration: 20.65
2023-05-10 15:03:16,263 WARNING [train.py:1182] (0/2) Exclude cut with ID 5796-66357-0007-116447-0 from training. Duration: 21.46
2023-05-10 15:03:19,074 WARNING [train.py:1182] (0/2) Exclude cut with ID 3557-8342-0013-54691-0 from training. Duration: 0.92
2023-05-10 15:03:19,403 WARNING [train.py:1182] (0/2) Exclude cut with ID 8291-282929-0023-13010-0_sp0.9 from training. Duration: 23.7666875
2023-05-10 15:03:21,145 WARNING [train.py:1182] (0/2) Exclude cut with ID 8544-281189-0060-101339-0_sp0.9 from training. Duration: 20.861125
2023-05-10 15:03:21,634 WARNING [train.py:1182] (0/2) Exclude cut with ID 3488-65654-0031-41259-0_sp0.9 from training. Duration: 22.711125
2023-05-10 15:03:24,241 WARNING [train.py:1182] (0/2) Exclude cut with ID 6951-79737-0043-132310-0_sp1.1 from training. Duration: 22.986375
2023-05-10 15:03:25,018 WARNING [train.py:1182] (0/2) Exclude cut with ID 8040-260924-0003-80960-0_sp0.9 from training. Duration: 22.07225
2023-05-10 15:03:25,251 WARNING [train.py:1182] (0/2) Exclude cut with ID 7699-105389-0045-26330-0_sp0.9 from training. Duration: 20.3055625
2023-05-10 15:03:25,373 WARNING [train.py:1182] (0/2) Exclude cut with ID 6356-271890-0060-94317-0_sp0.9 from training. Duration: 20.72225
2023-05-10 15:03:26,258 WARNING [train.py:1182] (0/2) Exclude cut with ID 3488-85273-0017-41203-0_sp1.1 from training. Duration: 22.4818125
2023-05-10 15:03:27,506 WARNING [train.py:1182] (0/2) Exclude cut with ID 4964-30587-0040-44509-0_sp0.9 from training. Duration: 25.0944375
2023-05-10 15:03:27,694 WARNING [train.py:1182] (0/2) Exclude cut with ID 6533-399-0047-104881-0 from training. Duration: 21.515
2023-05-10 15:03:28,002 WARNING [train.py:1182] (0/2) Exclude cut with ID 7357-94126-0009-15840-0_sp0.9 from training. Duration: 27.02225
2023-05-10 15:03:28,233 WARNING [train.py:1182] (0/2) Exclude cut with ID 432-122774-0010-62480-0_sp0.9 from training. Duration: 22.22225
2023-05-10 15:03:28,590 WARNING [train.py:1182] (0/2) Exclude cut with ID 4964-30587-0085-44554-0_sp0.9 from training. Duration: 20.85
2023-05-10 15:03:30,714 WARNING [train.py:1182] (0/2) Exclude cut with ID 4295-39940-0007-92567-0 from training. Duration: 21.54
2023-05-10 15:03:30,920 WARNING [train.py:1182] (0/2) Exclude cut with ID 4964-30587-0040-44509-0_sp1.1 from training. Duration: 20.5318125
2023-05-10 15:03:31,383 WARNING [train.py:1182] (0/2) Exclude cut with ID 7255-291500-0012-134311-0_sp0.9 from training. Duration: 21.9333125
2023-05-10 15:03:33,707 WARNING [train.py:1182] (0/2) Exclude cut with ID 8631-249866-0025-130151-0_sp0.9 from training. Duration: 21.7944375
2023-05-10 15:03:34,303 WARNING [train.py:1182] (0/2) Exclude cut with ID 8291-282929-0002-12989-0_sp0.9 from training. Duration: 22.4666875
2023-05-10 15:03:34,654 WARNING [train.py:1182] (0/2) Exclude cut with ID 6121-9014-0076-24124-0 from training. Duration: 21.635
2023-05-10 15:03:34,914 WARNING [train.py:1182] (0/2) Exclude cut with ID 6121-9014-0076-24124-0_sp0.9 from training. Duration: 24.038875
2023-05-10 15:03:37,562 WARNING [train.py:1182] (0/2) Exclude cut with ID 7255-291500-0009-134308-0_sp1.1 from training. Duration: 21.786375
2023-05-10 15:03:38,138 WARNING [train.py:1182] (0/2) Exclude cut with ID 8291-282929-0002-12989-0 from training. Duration: 20.22
2023-05-10 15:03:44,460 WARNING [train.py:1182] (0/2) Exclude cut with ID 6951-79737-0043-132310-0 from training. Duration: 25.285
2023-05-10 15:03:48,136 WARNING [train.py:1182] (0/2) Exclude cut with ID 811-130148-0001-63453-0_sp0.9 from training. Duration: 20.861125
2023-05-10 15:03:49,199 WARNING [train.py:1182] (0/2) Exclude cut with ID 6010-56788-0055-90261-0 from training. Duration: 20.88
2023-05-10 15:03:50,707 WARNING [train.py:1182] (0/2) Exclude cut with ID 7357-94126-0045-15876-0_sp0.9 from training. Duration: 23.4166875
2023-05-10 15:03:54,944 WARNING [train.py:1182] (0/2) Exclude cut with ID 7255-291500-0006-134305-0 from training. Duration: 21.24
2023-05-10 15:03:54,958 WARNING [train.py:1182] (0/2) Exclude cut with ID 6533-399-0047-104881-0_sp0.9 from training. Duration: 23.9055625
2023-05-10 15:03:56,570 WARNING [train.py:1182] (0/2) Exclude cut with ID 6758-72288-0033-108368-0_sp0.9 from training. Duration: 25.988875
2023-05-10 15:03:56,981 WARNING [train.py:1182] (0/2) Exclude cut with ID 7255-291500-0001-134300-0_sp0.9 from training. Duration: 20.67225
2023-05-10 15:03:59,858 WARNING [train.py:1182] (0/2) Exclude cut with ID 3488-85273-0038-41224-0 from training. Duration: 20.34
2023-05-10 15:04:03,358 WARNING [train.py:1182] (0/2) Exclude cut with ID 7357-94126-0026-15857-0_sp0.9 from training. Duration: 25.061125
2023-05-10 15:04:03,931 WARNING [train.py:1182] (0/2) Exclude cut with ID 3033-130750-0096-55598-0 from training. Duration: 0.83
2023-05-10 15:04:05,765 WARNING [train.py:1182] (0/2) Exclude cut with ID 3488-85273-0017-41203-0 from training. Duration: 24.73
2023-05-10 15:04:06,372 WARNING [train.py:1182] (0/2) Exclude cut with ID 7255-291500-0009-134308-0 from training. Duration: 23.965
2023-05-10 15:04:06,794 WARNING [train.py:1182] (0/2) Exclude cut with ID 6978-92210-0030-146996-0_sp0.9 from training. Duration: 22.088875
2023-05-10 15:04:07,604 WARNING [train.py:1182] (0/2) Exclude cut with ID 7255-291500-0006-134305-0_sp0.9 from training. Duration: 23.6
2023-05-10 15:04:13,088 WARNING [train.py:1182] (0/2) Exclude cut with ID 8291-282929-0024-13011-0 from training. Duration: 23.795
2023-05-10 15:04:13,873 WARNING [train.py:1182] (0/2) Exclude cut with ID 8631-249866-0030-130156-0_sp1.1 from training. Duration: 21.5409375
2023-05-10 15:04:14,008 WARNING [train.py:1182] (0/2) Exclude cut with ID 6978-92210-0019-146985-0_sp0.9 from training. Duration: 24.97775
2023-05-10 15:04:14,538 WARNING [train.py:1182] (0/2) Exclude cut with ID 1085-156170-0017-128270-0_sp0.9 from training. Duration: 23.3444375
2023-05-10 15:04:15,909 WARNING [train.py:1182] (0/2) Exclude cut with ID 6010-56788-0055-90261-0_sp0.9 from training. Duration: 23.2
2023-05-10 15:04:16,179 WARNING [train.py:1182] (0/2) Exclude cut with ID 5653-46179-0060-117930-0_sp0.9 from training. Duration: 21.17225
2023-05-10 15:04:17,140 WARNING [train.py:1182] (0/2) Exclude cut with ID 7255-291500-0008-134307-0_sp0.9 from training. Duration: 24.6555625
2023-05-10 15:04:20,432 WARNING [train.py:1182] (0/2) Exclude cut with ID 3488-65654-0031-41259-0 from training. Duration: 20.44
2023-05-10 15:04:21,153 WARNING [train.py:1182] (0/2) Exclude cut with ID 6951-79737-0018-132285-0_sp0.9 from training. Duration: 23.45
2023-05-10 15:04:22,602 WARNING [train.py:1182] (0/2) Exclude cut with ID 6945-60535-0076-12784-0_sp0.9 from training. Duration: 20.52225
2023-05-10 15:04:22,963 WARNING [train.py:1182] (0/2) Exclude cut with ID 7255-291500-0008-134307-0 from training. Duration: 22.19
2023-05-10 15:04:24,050 WARNING [train.py:1182] (0/2) Exclude cut with ID 7357-94126-0014-15845-0_sp1.1 from training. Duration: 25.3818125
2023-05-10 15:04:24,872 WARNING [train.py:1182] (0/2) Exclude cut with ID 6951-79737-0043-132310-0_sp0.9 from training. Duration: 28.0944375
2023-05-10 15:04:25,155 WARNING [train.py:1182] (0/2) Exclude cut with ID 2195-150901-0045-59933-0_sp0.9 from training. Duration: 22.9444375
2023-05-10 15:04:25,554 WARNING [train.py:1182] (0/2) Exclude cut with ID 8291-282929-0024-13011-0_sp1.1 from training. Duration: 21.6318125
2023-05-10 15:04:26,353 WARNING [train.py:1182] (0/2) Exclude cut with ID 8631-249866-0030-130156-0 from training. Duration: 23.695
2023-05-10 15:04:27,576 WARNING [train.py:1182] (0/2) Exclude cut with ID 7699-105389-0094-26379-0 from training. Duration: 23.955
2023-05-10 15:04:29,795 WARNING [train.py:1182] (0/2) Exclude cut with ID 8291-282929-0024-13011-0_sp0.9 from training. Duration: 26.438875
2023-05-10 15:04:31,950 WARNING [train.py:1182] (0/2) Exclude cut with ID 7699-105389-0021-26306-0_sp0.9 from training. Duration: 21.2444375
2023-05-10 15:04:31,994 WARNING [train.py:1182] (0/2) Exclude cut with ID 7357-94126-0014-15845-0_sp0.9 from training. Duration: 31.02225
2023-05-10 15:04:32,479 WARNING [train.py:1182] (0/2) Exclude cut with ID 432-122774-0017-62487-0 from training. Duration: 22.395
2023-05-10 15:04:33,275 WARNING [train.py:1182] (0/2) Exclude cut with ID 7357-94126-0045-15876-0 from training. Duration: 21.075
2023-05-10 15:04:33,515 WARNING [train.py:1182] (0/2) Exclude cut with ID 6482-98857-0025-147532-0_sp0.9 from training. Duration: 20.0055625
2023-05-10 15:04:33,534 WARNING [train.py:1182] (0/2) Exclude cut with ID 6951-79737-0037-132304-0_sp0.9 from training. Duration: 22.05
2023-05-10 15:04:33,545 WARNING [train.py:1182] (0/2) Exclude cut with ID 7255-291500-0003-134302-0 from training. Duration: 26.8349375
2023-05-10 15:04:33,704 WARNING [train.py:1182] (0/2) Exclude cut with ID 7357-94126-0009-15840-0_sp1.1 from training. Duration: 22.1090625
2023-05-10 15:04:34,036 WARNING [train.py:1182] (0/2) Exclude cut with ID 7699-105389-0094-26379-0_sp0.9 from training. Duration: 26.6166875
2023-05-10 15:04:34,893 WARNING [train.py:1182] (0/2) Exclude cut with ID 2046-178027-0000-53705-0_sp0.9 from training. Duration: 20.3055625
2023-05-10 15:04:36,571 WARNING [train.py:1182] (0/2) Exclude cut with ID 7205-50138-0008-5373-0_sp0.9 from training. Duration: 20.7
2023-05-10 15:04:38,665 WARNING [train.py:1182] (0/2) Exclude cut with ID 6978-92210-0019-146985-0 from training. Duration: 22.48
2023-05-10 15:04:39,418 WARNING [train.py:1182] (0/2) Exclude cut with ID 7255-291500-0003-134302-0_sp0.9 from training. Duration: 29.816625
2023-05-10 15:04:40,363 WARNING [train.py:1182] (0/2) Exclude cut with ID 7255-291500-0005-134304-0_sp1.1 from training. Duration: 22.7590625
2023-05-10 15:04:40,622 WARNING [train.py:1182] (0/2) Exclude cut with ID 7357-94126-0026-15857-0 from training. Duration: 22.555
2023-05-10 15:04:42,198 WARNING [train.py:1182] (0/2) Exclude cut with ID 1250-135782-0005-25975-0_sp0.9 from training. Duration: 21.688875
2023-05-10 15:04:43,857 WARNING [train.py:1182] (0/2) Exclude cut with ID 3488-85273-0038-41224-0_sp0.9 from training. Duration: 22.6
2023-05-10 15:04:45,692 WARNING [train.py:1182] (0/2) Exclude cut with ID 7357-94126-0009-15840-0 from training. Duration: 24.32
2023-05-10 15:04:49,096 WARNING [train.py:1182] (0/2) Exclude cut with ID 8291-276745-0093-13116-0_sp0.9 from training. Duration: 21.061125
2023-05-10 15:04:49,769 WARNING [train.py:1182] (0/2) Exclude cut with ID 7357-94126-0024-15855-0_sp0.9 from training. Duration: 20.32225
2023-05-10 15:04:50,384 WARNING [train.py:1182] (0/2) Exclude cut with ID 3033-130750-0096-55598-0_sp1.1 from training. Duration: 0.7545625
2023-05-10 15:04:51,102 WARNING [train.py:1182] (0/2) Exclude cut with ID 4295-39940-0007-92567-0_sp0.9 from training. Duration: 23.9333125
2023-05-10 15:04:52,655 WARNING [train.py:1182] (0/2) Exclude cut with ID 7255-291500-0008-134307-0_sp1.1 from training. Duration: 20.17275
2023-05-10 15:04:52,924 WARNING [train.py:1182] (0/2) Exclude cut with ID 6978-92210-0019-146985-0_sp1.1 from training. Duration: 20.436375
2023-05-10 15:04:57,441 WARNING [train.py:1182] (0/2) Exclude cut with ID 4234-40345-0022-142709-0_sp0.9 from training. Duration: 23.1055625
2023-05-10 15:04:57,538 WARNING [train.py:1182] (0/2) Exclude cut with ID 8291-282929-0007-12994-0_sp1.1 from training. Duration: 23.5
2023-05-10 15:04:58,059 WARNING [train.py:1182] (0/2) Exclude cut with ID 7255-291500-0009-134308-0_sp0.9 from training. Duration: 26.62775
2023-05-10 15:04:58,656 WARNING [train.py:1182] (0/2) Exclude cut with ID 6951-79737-0018-132285-0 from training. Duration: 21.105
2023-05-10 15:04:58,867 WARNING [train.py:1182] (0/2) Exclude cut with ID 4511-76322-0006-80011-0_sp0.9 from training. Duration: 24.411125
2023-05-10 15:05:00,753 WARNING [train.py:1182] (0/2) Exclude cut with ID 6758-72288-0033-108368-0_sp1.1 from training. Duration: 21.263625
2023-05-10 15:05:02,300 WARNING [train.py:1182] (0/2) Exclude cut with ID 4234-40345-0022-142709-0 from training. Duration: 20.795
2023-05-10 15:05:02,799 WARNING [train.py:1182] (0/2) Exclude cut with ID 7357-94126-0021-15852-0 from training. Duration: 24.76
2023-05-10 15:05:02,816 WARNING [train.py:1182] (0/2) Exclude cut with ID 3867-173237-0077-144769-0_sp0.9 from training. Duration: 22.25
2023-05-10 15:05:04,013 WARNING [train.py:1182] (0/2) Exclude cut with ID 7357-94126-0026-15857-0_sp1.1 from training. Duration: 20.5045625
2023-05-10 15:05:20,204 INFO [train.py:1357] (0/2) Maximum memory allocated so far is 17010MB
2023-05-10 15:05:23,134 INFO [train.py:1357] (0/2) Maximum memory allocated so far is 17953MB
2023-05-10 15:05:26,273 INFO [train.py:1357] (0/2) Maximum memory allocated so far is 17953MB
2023-05-10 15:05:29,339 INFO [train.py:1357] (0/2) Maximum memory allocated so far is 17953MB
2023-05-10 15:05:31,667 INFO [scaling.py:969] (0/2) Whitening: name=None, num_groups=1, num_channels=256, metric=12.93 vs. limit=7.5
2023-05-10 15:05:32,417 INFO [train.py:1357] (0/2) Maximum memory allocated so far is 17953MB
2023-05-10 15:05:35,464 INFO [train.py:1357] (0/2) Maximum memory allocated so far is 17953MB
2023-05-10 15:05:35,482 INFO [train.py:1238] (0/2) Loading grad scaler state dict
2023-05-10 15:05:50,092 WARNING [train.py:1182] (0/2) Exclude cut with ID 7859-102521-0017-7548-0_sp1.1 from training. Duration: 22.2954375
2023-05-10 15:05:56,780 INFO [train.py:1021] (0/2) Epoch 31, batch 0, loss[loss=0.167, simple_loss=0.257, pruned_loss=0.03845, over 36941.00 frames. ], tot_loss[loss=0.167, simple_loss=0.257, pruned_loss=0.03845, over 36941.00 frames. ], batch size: 100, lr: 3.57e-03, grad_scale: 32.0
2023-05-10 15:05:56,781 INFO [train.py:1048] (0/2) Computing validation loss
2023-05-10 15:06:07,469 INFO [train.py:1057] (0/2) Epoch 31, validation: loss=0.1535, simple_loss=0.2545, pruned_loss=0.02622, over 944034.00 frames.
2023-05-10 15:06:07,469 INFO [train.py:1058] (0/2) Maximum memory allocated so far is 17953MB
2023-05-10 15:06:12,770 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.96 vs. limit=15.0
2023-05-10 15:06:29,786 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=7.93 vs. limit=22.5
2023-05-10 15:06:55,333 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=545560.0, ans=0.0
2023-05-10 15:07:02,567 WARNING [train.py:1182] (0/2) Exclude cut with ID 298-126791-0067-24026-0_sp0.9 from training. Duration: 21.438875
2023-05-10 15:07:04,393 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.3459, 4.5028, 2.3061, 2.4828], device='cuda:0')
2023-05-10 15:07:08,566 WARNING [train.py:1182] (0/2) Exclude cut with ID 5652-39938-0025-23684-0_sp0.9 from training. Duration: 22.2055625
2023-05-10 15:07:16,376 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=545610.0, ans=0.0
2023-05-10 15:07:25,358 INFO [train.py:1021] (0/2) Epoch 31, batch 50, loss[loss=0.1416, simple_loss=0.2292, pruned_loss=0.02703, over 37064.00 frames. ], tot_loss[loss=0.1672, simple_loss=0.2613, pruned_loss=0.03651, over 1643928.81 frames. ], batch size: 88, lr: 3.57e-03, grad_scale: 32.0
2023-05-10 15:07:27,343 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=545660.0, ans=0.025
2023-05-10 15:08:12,091 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.9721, 4.1739, 4.5639, 4.5638], device='cuda:0')
2023-05-10 15:08:13,524 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=545810.0, ans=0.0
2023-05-10 15:08:25,314 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.5341, 4.1017, 3.8026, 4.1043, 3.4638, 3.0688, 3.5389, 3.0513],
device='cuda:0')
2023-05-10 15:08:35,741 INFO [optim.py:478] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.352e+02 3.107e+02 3.758e+02 4.562e+02 7.673e+02, threshold=7.517e+02, percent-clipped=0.0
2023-05-10 15:08:41,826 INFO [train.py:1021] (0/2) Epoch 31, batch 100, loss[loss=0.1675, simple_loss=0.2619, pruned_loss=0.0365, over 37094.00 frames. ], tot_loss[loss=0.1672, simple_loss=0.2603, pruned_loss=0.03708, over 2877976.87 frames. ], batch size: 103, lr: 3.56e-03, grad_scale: 32.0
2023-05-10 15:08:48,926 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.9856, 4.2904, 2.6214, 3.2048], device='cuda:0')
2023-05-10 15:08:53,450 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=545910.0, ans=0.125
2023-05-10 15:09:06,860 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=545960.0, ans=0.0
2023-05-10 15:09:59,073 INFO [train.py:1021] (0/2) Epoch 31, batch 150, loss[loss=0.1815, simple_loss=0.2734, pruned_loss=0.04482, over 37062.00 frames. ], tot_loss[loss=0.1663, simple_loss=0.2589, pruned_loss=0.03688, over 3839797.66 frames. ], batch size: 116, lr: 3.56e-03, grad_scale: 32.0
2023-05-10 15:10:17,136 WARNING [train.py:1182] (0/2) Exclude cut with ID 7859-102521-0017-7548-0 from training. Duration: 24.525
2023-05-10 15:10:25,753 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.84 vs. limit=22.5
2023-05-10 15:10:30,025 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.7997, 4.1541, 2.9814, 2.8091], device='cuda:0')
2023-05-10 15:10:36,543 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=546260.0, ans=0.125
2023-05-10 15:10:36,660 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.6982, 3.8095, 4.2233, 3.7726], device='cuda:0')
2023-05-10 15:10:40,984 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=546260.0, ans=0.125
2023-05-10 15:10:42,686 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6216, 4.0159, 2.8898, 2.6416], device='cuda:0')
2023-05-10 15:10:54,259 WARNING [train.py:1182] (0/2) Exclude cut with ID 3699-47246-0007-3408-0_sp0.9 from training. Duration: 20.26675
2023-05-10 15:11:07,936 WARNING [train.py:1182] (0/2) Exclude cut with ID 7859-102521-0017-7548-0_sp0.9 from training. Duration: 27.25
2023-05-10 15:11:09,303 INFO [optim.py:478] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.288e+02 3.040e+02 3.340e+02 4.209e+02 6.958e+02, threshold=6.680e+02, percent-clipped=1.0
2023-05-10 15:11:09,792 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=546360.0, ans=0.1
2023-05-10 15:11:15,505 INFO [train.py:1021] (0/2) Epoch 31, batch 200, loss[loss=0.1726, simple_loss=0.2642, pruned_loss=0.0405, over 34822.00 frames. ], tot_loss[loss=0.1651, simple_loss=0.2572, pruned_loss=0.03649, over 4560889.82 frames. ], batch size: 145, lr: 3.56e-03, grad_scale: 32.0
2023-05-10 15:11:25,731 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=546410.0, ans=0.0
2023-05-10 15:11:45,989 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.5995, 5.4354, 4.6645, 5.1946], device='cuda:0')
2023-05-10 15:11:50,546 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=546510.0, ans=0.125
2023-05-10 15:12:01,288 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=546560.0, ans=0.125
2023-05-10 15:12:13,931 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=546560.0, ans=0.1
2023-05-10 15:12:29,266 WARNING [train.py:1182] (0/2) Exclude cut with ID 6426-64292-0017-15984-0 from training. Duration: 21.68
2023-05-10 15:12:33,253 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.31 vs. limit=15.0
2023-05-10 15:12:33,770 INFO [train.py:1021] (0/2) Epoch 31, batch 250, loss[loss=0.1395, simple_loss=0.2241, pruned_loss=0.02743, over 35347.00 frames. ], tot_loss[loss=0.1637, simple_loss=0.2558, pruned_loss=0.03575, over 5156223.85 frames. ], batch size: 78, lr: 3.56e-03, grad_scale: 32.0
2023-05-10 15:12:39,738 WARNING [train.py:1182] (0/2) Exclude cut with ID 4278-13270-0007-59342-0 from training. Duration: 21.6300625
2023-05-10 15:13:05,085 WARNING [train.py:1182] (0/2) Exclude cut with ID 4278-13270-0007-59342-0_sp0.9 from training. Duration: 24.033375
2023-05-10 15:13:07,469 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=546760.0, ans=0.0
2023-05-10 15:13:09,674 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0
2023-05-10 15:13:27,790 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=546810.0, ans=0.125
2023-05-10 15:13:27,859 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=546810.0, ans=0.125
2023-05-10 15:13:43,963 INFO [optim.py:478] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.396e+02 2.848e+02 3.242e+02 3.786e+02 5.651e+02, threshold=6.485e+02, percent-clipped=0.0
2023-05-10 15:13:47,547 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.94 vs. limit=10.0
2023-05-10 15:13:49,854 INFO [train.py:1021] (0/2) Epoch 31, batch 300, loss[loss=0.1524, simple_loss=0.2374, pruned_loss=0.03367, over 37033.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.2551, pruned_loss=0.03533, over 5627731.73 frames. ], batch size: 88, lr: 3.56e-03, grad_scale: 32.0
2023-05-10 15:14:01,485 INFO [scaling.py:1059] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00
2023-05-10 15:14:07,108 WARNING [train.py:1182] (0/2) Exclude cut with ID 4278-13270-0009-59344-0 from training. Duration: 22.905
2023-05-10 15:14:08,643 WARNING [train.py:1182] (0/2) Exclude cut with ID 5622-44585-0006-90525-0_sp1.1 from training. Duration: 23.4318125
2023-05-10 15:14:16,643 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=546960.0, ans=0.1
2023-05-10 15:14:21,445 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=547010.0, ans=0.2
2023-05-10 15:14:55,388 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=547110.0, ans=0.2
2023-05-10 15:14:55,390 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=547110.0, ans=0.125
2023-05-10 15:15:07,217 INFO [train.py:1021] (0/2) Epoch 31, batch 350, loss[loss=0.1602, simple_loss=0.2412, pruned_loss=0.03962, over 35277.00 frames. ], tot_loss[loss=0.1628, simple_loss=0.2549, pruned_loss=0.03541, over 5976993.81 frames. ], batch size: 78, lr: 3.56e-03, grad_scale: 32.0
2023-05-10 15:15:10,653 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=547160.0, ans=0.2
2023-05-10 15:15:47,160 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=547260.0, ans=0.2
2023-05-10 15:16:12,529 WARNING [train.py:1182] (0/2) Exclude cut with ID 4278-13270-0009-59344-0_sp1.1 from training. Duration: 20.82275
2023-05-10 15:16:14,100 WARNING [train.py:1182] (0/2) Exclude cut with ID 4278-13270-0009-59344-0_sp0.9 from training. Duration: 25.45
2023-05-10 15:16:18,347 INFO [optim.py:478] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.673e+02 3.280e+02 4.154e+02 5.203e+02 7.776e+02, threshold=8.309e+02, percent-clipped=9.0
2023-05-10 15:16:24,509 INFO [train.py:1021] (0/2) Epoch 31, batch 400, loss[loss=0.175, simple_loss=0.2709, pruned_loss=0.03955, over 36761.00 frames. ], tot_loss[loss=0.163, simple_loss=0.2553, pruned_loss=0.03536, over 6267821.83 frames. ], batch size: 122, lr: 3.56e-03, grad_scale: 32.0
2023-05-10 15:16:46,616 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.4141, 3.7811, 3.9728, 3.7991], device='cuda:0')
2023-05-10 15:17:08,277 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=547510.0, ans=0.125
2023-05-10 15:17:13,181 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=7.71 vs. limit=22.5
2023-05-10 15:17:15,540 WARNING [train.py:1182] (0/2) Exclude cut with ID 5622-44585-0006-90525-0 from training. Duration: 25.775
2023-05-10 15:17:29,199 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=547610.0, ans=0.0
2023-05-10 15:17:35,656 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.03 vs. limit=15.0
2023-05-10 15:17:38,674 WARNING [train.py:1182] (0/2) Exclude cut with ID 2929-85685-0071-62375-0_sp0.9 from training. Duration: 22.25
2023-05-10 15:17:41,632 INFO [train.py:1021] (0/2) Epoch 31, batch 450, loss[loss=0.1658, simple_loss=0.2569, pruned_loss=0.03736, over 37012.00 frames. ], tot_loss[loss=0.1644, simple_loss=0.2567, pruned_loss=0.036, over 6448129.47 frames. ], batch size: 99, lr: 3.56e-03, grad_scale: 32.0
2023-05-10 15:18:08,159 WARNING [train.py:1182] (0/2) Exclude cut with ID 3972-170212-0014-23379-0 from training. Duration: 26.205
2023-05-10 15:18:09,891 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=547710.0, ans=0.125
2023-05-10 15:18:17,614 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=547760.0, ans=0.125
2023-05-10 15:18:26,222 WARNING [train.py:1182] (0/2) Exclude cut with ID 5239-32139-0047-9341-0_sp0.9 from training. Duration: 30.1555625
2023-05-10 15:18:30,801 WARNING [train.py:1182] (0/2) Exclude cut with ID 1265-135635-0050-6781-0_sp0.9 from training. Duration: 21.8333125
2023-05-10 15:18:40,287 WARNING [train.py:1182] (0/2) Exclude cut with ID 1914-133440-0024-94914-0_sp1.1 from training. Duration: 20.6545625
2023-05-10 15:18:44,891 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=547860.0, ans=0.05
2023-05-10 15:18:49,306 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=547860.0, ans=0.025
2023-05-10 15:18:50,893 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=547860.0, ans=0.2
2023-05-10 15:18:53,144 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.47 vs. limit=15.0
2023-05-10 15:18:53,486 INFO [optim.py:478] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.465e+02 3.105e+02 3.561e+02 4.354e+02 6.456e+02, threshold=7.122e+02, percent-clipped=0.0
2023-05-10 15:18:58,051 INFO [train.py:1021] (0/2) Epoch 31, batch 500, loss[loss=0.1771, simple_loss=0.2723, pruned_loss=0.04099, over 36383.00 frames. ], tot_loss[loss=0.1646, simple_loss=0.2568, pruned_loss=0.03615, over 6616698.03 frames. ], batch size: 126, lr: 3.56e-03, grad_scale: 16.0
2023-05-10 15:19:04,896 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=547910.0, ans=0.0
2023-05-10 15:19:26,459 WARNING [train.py:1182] (0/2) Exclude cut with ID 7395-89880-0045-39920-0_sp0.9 from training. Duration: 20.52225
2023-05-10 15:19:34,201 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([6.0147, 5.1473, 5.2880, 5.8740], device='cuda:0')
2023-05-10 15:19:45,756 WARNING [train.py:1182] (0/2) Exclude cut with ID 3972-170212-0014-23379-0_sp0.9 from training. Duration: 29.1166875
2023-05-10 15:19:46,650 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.81 vs. limit=15.0
2023-05-10 15:19:47,543 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=548060.0, ans=0.2
2023-05-10 15:19:52,469 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([1.9288, 2.9557, 4.4573, 3.1836], device='cuda:0')
2023-05-10 15:20:14,646 INFO [train.py:1021] (0/2) Epoch 31, batch 550, loss[loss=0.1767, simple_loss=0.277, pruned_loss=0.03819, over 36828.00 frames. ], tot_loss[loss=0.1656, simple_loss=0.2581, pruned_loss=0.03654, over 6745893.33 frames. ], batch size: 111, lr: 3.56e-03, grad_scale: 16.0
2023-05-10 15:20:23,253 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=548160.0, ans=0.0
2023-05-10 15:20:47,658 WARNING [train.py:1182] (0/2) Exclude cut with ID 543-133211-0007-59831-0_sp0.9 from training. Duration: 21.388875
2023-05-10 15:20:52,327 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=548260.0, ans=0.1
2023-05-10 15:21:03,470 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.89 vs. limit=15.0
2023-05-10 15:21:25,891 WARNING [train.py:1182] (0/2) Exclude cut with ID 1914-133440-0024-94914-0 from training. Duration: 22.72
2023-05-10 15:21:27,148 INFO [optim.py:478] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.540e+02 3.228e+02 4.169e+02 5.214e+02 9.380e+02, threshold=8.337e+02, percent-clipped=5.0
2023-05-10 15:21:27,308 WARNING [train.py:1182] (0/2) Exclude cut with ID 1914-133440-0031-94921-0_sp0.9 from training. Duration: 22.7444375
2023-05-10 15:21:31,809 INFO [train.py:1021] (0/2) Epoch 31, batch 600, loss[loss=0.1548, simple_loss=0.24, pruned_loss=0.03486, over 34463.00 frames. ], tot_loss[loss=0.1666, simple_loss=0.2594, pruned_loss=0.03693, over 6852460.42 frames. ], batch size: 76, lr: 3.56e-03, grad_scale: 16.0
2023-05-10 15:21:56,812 INFO [scaling.py:1059] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00
2023-05-10 15:22:04,110 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=548510.0, ans=0.125
2023-05-10 15:22:16,231 WARNING [train.py:1182] (0/2) Exclude cut with ID 4133-6541-0027-40495-0_sp1.1 from training. Duration: 0.9681875
2023-05-10 15:22:16,630 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=548560.0, ans=0.2
2023-05-10 15:22:19,392 WARNING [train.py:1182] (0/2) Exclude cut with ID 6330-62851-0022-91297-0_sp0.9 from training. Duration: 22.3166875
2023-05-10 15:22:25,509 WARNING [train.py:1182] (0/2) Exclude cut with ID 543-133212-0015-59917-0_sp0.9 from training. Duration: 21.8166875
2023-05-10 15:22:48,259 INFO [train.py:1021] (0/2) Epoch 31, batch 650, loss[loss=0.1799, simple_loss=0.2758, pruned_loss=0.04199, over 36403.00 frames. ], tot_loss[loss=0.1661, simple_loss=0.259, pruned_loss=0.03664, over 6960434.95 frames. ], batch size: 126, lr: 3.56e-03, grad_scale: 16.0
2023-05-10 15:23:18,605 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.40 vs. limit=22.5
2023-05-10 15:23:28,297 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=548760.0, ans=0.125
2023-05-10 15:24:00,843 INFO [optim.py:478] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.447e+02 2.901e+02 3.227e+02 3.862e+02 5.542e+02, threshold=6.454e+02, percent-clipped=0.0
2023-05-10 15:24:05,301 INFO [train.py:1021] (0/2) Epoch 31, batch 700, loss[loss=0.1825, simple_loss=0.2744, pruned_loss=0.04533, over 36335.00 frames. ], tot_loss[loss=0.1655, simple_loss=0.258, pruned_loss=0.03649, over 7023249.67 frames. ], batch size: 126, lr: 3.55e-03, grad_scale: 16.0
2023-05-10 15:24:05,689 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=548910.0, ans=0.2
2023-05-10 15:24:14,533 WARNING [train.py:1182] (0/2) Exclude cut with ID 4957-30119-0041-23990-0_sp0.9 from training. Duration: 20.22775
2023-05-10 15:24:14,753 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=548910.0, ans=0.125
2023-05-10 15:25:00,730 WARNING [train.py:1182] (0/2) Exclude cut with ID 5239-32139-0047-9341-0_sp1.1 from training. Duration: 24.67275
2023-05-10 15:25:02,414 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([6.2901, 5.5967, 5.4186, 6.0037], device='cuda:0')
2023-05-10 15:25:08,464 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=549110.0, ans=0.0
2023-05-10 15:25:21,526 INFO [train.py:1021] (0/2) Epoch 31, batch 750, loss[loss=0.1491, simple_loss=0.2353, pruned_loss=0.03145, over 36815.00 frames. ], tot_loss[loss=0.1657, simple_loss=0.2587, pruned_loss=0.03641, over 7065736.92 frames. ], batch size: 89, lr: 3.55e-03, grad_scale: 16.0
2023-05-10 15:25:29,008 WARNING [train.py:1182] (0/2) Exclude cut with ID 3082-165428-0081-50734-0_sp0.9 from training. Duration: 21.8055625
2023-05-10 15:25:34,272 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=549160.0, ans=0.2
2023-05-10 15:25:48,468 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=549210.0, ans=0.2
2023-05-10 15:25:54,557 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=549260.0, ans=0.125
2023-05-10 15:26:09,154 WARNING [train.py:1182] (0/2) Exclude cut with ID 3340-169293-0054-76830-0_sp0.9 from training. Duration: 22.6666875
2023-05-10 15:26:29,960 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.37 vs. limit=15.0
2023-05-10 15:26:33,622 INFO [optim.py:478] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.413e+02 3.048e+02 3.343e+02 3.928e+02 7.448e+02, threshold=6.687e+02, percent-clipped=2.0
2023-05-10 15:26:38,120 INFO [train.py:1021] (0/2) Epoch 31, batch 800, loss[loss=0.1777, simple_loss=0.2777, pruned_loss=0.03886, over 36910.00 frames. ], tot_loss[loss=0.1652, simple_loss=0.2581, pruned_loss=0.03619, over 7109184.95 frames. ], batch size: 105, lr: 3.55e-03, grad_scale: 32.0
2023-05-10 15:26:38,449 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=549410.0, ans=0.125
2023-05-10 15:26:48,010 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=549410.0, ans=0.025
2023-05-10 15:26:49,591 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=549410.0, ans=0.05
2023-05-10 15:27:01,614 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=549460.0, ans=0.1
2023-05-10 15:27:06,141 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=549460.0, ans=0.0
2023-05-10 15:27:07,735 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=549510.0, ans=0.125
2023-05-10 15:27:13,693 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=549510.0, ans=0.0
2023-05-10 15:27:15,489 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.60 vs. limit=12.0
2023-05-10 15:27:16,290 WARNING [train.py:1182] (0/2) Exclude cut with ID 2411-132532-0017-82279-0_sp1.1 from training. Duration: 0.9681875
2023-05-10 15:27:26,283 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.4872, 5.3213, 4.5626, 5.0029], device='cuda:0')
2023-05-10 15:27:26,334 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.2636, 4.1427, 3.7908, 4.1250, 3.4344, 3.1745, 3.5796, 3.1339],
device='cuda:0')
2023-05-10 15:27:27,685 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=549560.0, ans=0.125
2023-05-10 15:27:44,617 WARNING [train.py:1182] (0/2) Exclude cut with ID 6330-62850-0007-91323-0 from training. Duration: 22.485
2023-05-10 15:27:52,338 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.4721, 3.5554, 3.3540, 4.2025, 2.6535, 3.6130, 4.2473, 3.6395],
device='cuda:0')
2023-05-10 15:27:54,967 INFO [train.py:1021] (0/2) Epoch 31, batch 850, loss[loss=0.1488, simple_loss=0.2371, pruned_loss=0.03022, over 36955.00 frames. ], tot_loss[loss=0.1658, simple_loss=0.2585, pruned_loss=0.03649, over 7108479.19 frames. ], batch size: 86, lr: 3.55e-03, grad_scale: 16.0
2023-05-10 15:27:58,352 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=549660.0, ans=0.0
2023-05-10 15:28:11,976 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.1431, 4.4626, 3.1460, 3.0267], device='cuda:0')
2023-05-10 15:28:24,308 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=549760.0, ans=0.0
2023-05-10 15:28:26,891 WARNING [train.py:1182] (0/2) Exclude cut with ID 3972-170212-0014-23379-0_sp1.1 from training. Duration: 23.82275
2023-05-10 15:28:39,419 WARNING [train.py:1182] (0/2) Exclude cut with ID 4860-13185-0032-76709-0 from training. Duration: 20.77
2023-05-10 15:28:48,515 WARNING [train.py:1182] (0/2) Exclude cut with ID 6426-64292-0017-15984-0_sp0.9 from training. Duration: 24.088875
2023-05-10 15:28:53,256 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=549810.0, ans=0.125
2023-05-10 15:28:56,377 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=549860.0, ans=0.0
2023-05-10 15:28:56,425 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=549860.0, ans=0.0
2023-05-10 15:29:07,772 INFO [optim.py:478] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.440e+02 3.060e+02 3.523e+02 4.302e+02 6.165e+02, threshold=7.046e+02, percent-clipped=0.0
2023-05-10 15:29:10,736 INFO [train.py:1021] (0/2) Epoch 31, batch 900, loss[loss=0.1788, simple_loss=0.2764, pruned_loss=0.04061, over 36278.00 frames. ], tot_loss[loss=0.1659, simple_loss=0.2591, pruned_loss=0.03639, over 7157185.23 frames. ], batch size: 126, lr: 3.55e-03, grad_scale: 16.0
2023-05-10 15:29:18,935 WARNING [train.py:1182] (0/2) Exclude cut with ID 6330-62850-0007-91323-0_sp1.1 from training. Duration: 20.4409375
2023-05-10 15:29:28,790 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.4508, 3.2890, 3.1630, 3.9988, 2.4069, 3.4667, 4.0582, 3.4956],
device='cuda:0')
2023-05-10 15:29:28,823 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=549960.0, ans=0.2
2023-05-10 15:29:42,489 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=550010.0, ans=0.125
2023-05-10 15:29:44,446 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.64 vs. limit=15.0
2023-05-10 15:29:54,383 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=550010.0, ans=0.05
2023-05-10 15:30:28,648 INFO [train.py:1021] (0/2) Epoch 31, batch 950, loss[loss=0.1479, simple_loss=0.2372, pruned_loss=0.02928, over 37006.00 frames. ], tot_loss[loss=0.1661, simple_loss=0.2592, pruned_loss=0.03644, over 7173276.51 frames. ], batch size: 91, lr: 3.55e-03, grad_scale: 16.0
2023-05-10 15:30:40,482 WARNING [train.py:1182] (0/2) Exclude cut with ID 2929-85685-0062-62366-0_sp0.9 from training. Duration: 22.511125
2023-05-10 15:30:40,532 WARNING [train.py:1182] (0/2) Exclude cut with ID 7395-89880-0031-39906-0 from training. Duration: 20.675
2023-05-10 15:30:42,384 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=550210.0, ans=0.125
2023-05-10 15:31:41,802 INFO [optim.py:478] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.302e+02 3.080e+02 3.899e+02 4.855e+02 7.501e+02, threshold=7.799e+02, percent-clipped=2.0
2023-05-10 15:31:42,151 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=550360.0, ans=0.1
2023-05-10 15:31:42,152 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=550360.0, ans=0.125
2023-05-10 15:31:44,796 INFO [train.py:1021] (0/2) Epoch 31, batch 1000, loss[loss=0.1892, simple_loss=0.2839, pruned_loss=0.04721, over 36772.00 frames. ], tot_loss[loss=0.1658, simple_loss=0.259, pruned_loss=0.03634, over 7168892.88 frames. ], batch size: 122, lr: 3.55e-03, grad_scale: 16.0
2023-05-10 15:31:48,239 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=550410.0, ans=0.0
2023-05-10 15:31:52,825 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=550410.0, ans=0.125
2023-05-10 15:31:57,503 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=550410.0, ans=0.1
2023-05-10 15:32:02,664 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.55 vs. limit=15.0
2023-05-10 15:32:25,433 WARNING [train.py:1182] (0/2) Exclude cut with ID 6330-62850-0007-91323-0_sp0.9 from training. Duration: 24.9833125
2023-05-10 15:32:39,594 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.62 vs. limit=10.0
2023-05-10 15:32:49,687 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=550610.0, ans=0.0
2023-05-10 15:32:51,013 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=550610.0, ans=0.0
2023-05-10 15:32:55,564 WARNING [train.py:1182] (0/2) Exclude cut with ID 5239-32139-0047-9341-0 from training. Duration: 27.14
2023-05-10 15:32:57,899 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=550610.0, ans=0.95
2023-05-10 15:33:02,143 INFO [train.py:1021] (0/2) Epoch 31, batch 1050, loss[loss=0.1533, simple_loss=0.2427, pruned_loss=0.03191, over 37162.00 frames. ], tot_loss[loss=0.1661, simple_loss=0.2592, pruned_loss=0.03653, over 7152312.09 frames. ], batch size: 93, lr: 3.55e-03, grad_scale: 16.0
2023-05-10 15:33:11,046 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.12 vs. limit=15.0
2023-05-10 15:33:11,925 WARNING [train.py:1182] (0/2) Exclude cut with ID 2929-85685-0044-62348-0 from training. Duration: 22.44
2023-05-10 15:33:15,071 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([1.6936, 2.9127, 4.4510, 2.6973], device='cuda:0')
2023-05-10 15:33:18,018 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.5227, 5.3779, 4.6604, 5.1037], device='cuda:0')
2023-05-10 15:33:20,016 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.53 vs. limit=22.5
2023-05-10 15:33:46,591 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=550810.0, ans=0.125
2023-05-10 15:33:48,107 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=550810.0, ans=0.1
2023-05-10 15:34:05,986 INFO [scaling.py:1059] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00
2023-05-10 15:34:11,722 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=550860.0, ans=0.1
2023-05-10 15:34:11,760 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=550860.0, ans=0.125
2023-05-10 15:34:15,750 INFO [optim.py:478] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.555e+02 3.227e+02 3.954e+02 5.012e+02 6.836e+02, threshold=7.908e+02, percent-clipped=0.0
2023-05-10 15:34:16,159 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=550860.0, ans=0.0
2023-05-10 15:34:18,915 INFO [train.py:1021] (0/2) Epoch 31, batch 1100, loss[loss=0.1707, simple_loss=0.2688, pruned_loss=0.03631, over 36837.00 frames. ], tot_loss[loss=0.166, simple_loss=0.2592, pruned_loss=0.03643, over 7155079.15 frames. ], batch size: 111, lr: 3.55e-03, grad_scale: 16.0
2023-05-10 15:34:33,994 WARNING [train.py:1182] (0/2) Exclude cut with ID 2929-85685-0060-62364-0_sp0.9 from training. Duration: 21.361125
2023-05-10 15:34:39,817 WARNING [train.py:1182] (0/2) Exclude cut with ID 2929-85685-0079-62383-0_sp1.1 from training. Duration: 27.0318125
2023-05-10 15:34:52,672 WARNING [train.py:1182] (0/2) Exclude cut with ID 5622-44585-0006-90525-0_sp0.9 from training. Duration: 28.638875
2023-05-10 15:35:10,233 WARNING [train.py:1182] (0/2) Exclude cut with ID 3340-169293-0054-76830-0 from training. Duration: 20.4
2023-05-10 15:35:25,472 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=551110.0, ans=0.125
2023-05-10 15:35:35,565 INFO [train.py:1021] (0/2) Epoch 31, batch 1150, loss[loss=0.1722, simple_loss=0.2683, pruned_loss=0.03804, over 37017.00 frames. ], tot_loss[loss=0.1661, simple_loss=0.2592, pruned_loss=0.03653, over 7150232.15 frames. ], batch size: 116, lr: 3.55e-03, grad_scale: 16.0
2023-05-10 15:35:42,283 WARNING [train.py:1182] (0/2) Exclude cut with ID 2929-85685-0071-62375-0 from training. Duration: 20.025
2023-05-10 15:35:43,847 WARNING [train.py:1182] (0/2) Exclude cut with ID 2364-131735-0112-64612-0_sp0.9 from training. Duration: 20.488875
2023-05-10 15:35:44,077 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=551160.0, ans=0.125
2023-05-10 15:35:50,446 WARNING [train.py:1182] (0/2) Exclude cut with ID 2929-85685-0079-62383-0 from training. Duration: 29.735
2023-05-10 15:35:50,773 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=551210.0, ans=0.125
2023-05-10 15:36:07,337 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=551260.0, ans=0.125
2023-05-10 15:36:26,661 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([6.5094, 5.7575, 5.6181, 6.1976], device='cuda:0')
2023-05-10 15:36:49,311 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.22 vs. limit=15.0
2023-05-10 15:36:50,019 INFO [optim.py:478] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.506e+02 3.034e+02 3.462e+02 4.224e+02 6.851e+02, threshold=6.925e+02, percent-clipped=0.0
2023-05-10 15:36:52,983 INFO [train.py:1021] (0/2) Epoch 31, batch 1200, loss[loss=0.1585, simple_loss=0.2473, pruned_loss=0.03486, over 36966.00 frames. ], tot_loss[loss=0.1659, simple_loss=0.2589, pruned_loss=0.03638, over 7174118.42 frames. ], batch size: 91, lr: 3.55e-03, grad_scale: 32.0
2023-05-10 15:37:05,255 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=551410.0, ans=0.0
2023-05-10 15:37:13,141 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=551460.0, ans=0.125
2023-05-10 15:37:15,981 WARNING [train.py:1182] (0/2) Exclude cut with ID 7276-92427-0014-12983-0_sp0.9 from training. Duration: 21.3055625
2023-05-10 15:37:17,383 WARNING [train.py:1182] (0/2) Exclude cut with ID 1025-75365-0008-79168-0_sp0.9 from training. Duration: 22.0666875
2023-05-10 15:37:31,504 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=551510.0, ans=0.0
2023-05-10 15:37:43,067 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=551560.0, ans=0.125
2023-05-10 15:37:58,029 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=551610.0, ans=0.125
2023-05-10 15:38:09,505 INFO [train.py:1021] (0/2) Epoch 31, batch 1250, loss[loss=0.1769, simple_loss=0.278, pruned_loss=0.03793, over 32265.00 frames. ], tot_loss[loss=0.1654, simple_loss=0.2586, pruned_loss=0.03615, over 7150183.11 frames. ], batch size: 170, lr: 3.55e-03, grad_scale: 16.0
2023-05-10 15:38:18,944 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=551660.0, ans=0.0
2023-05-10 15:38:23,597 INFO [scaling.py:1059] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00
2023-05-10 15:38:26,675 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=551710.0, ans=0.0
2023-05-10 15:38:28,779 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=551710.0, ans=0.125
2023-05-10 15:38:28,895 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=551710.0, ans=0.125
2023-05-10 15:38:35,496 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=551710.0, ans=0.125
2023-05-10 15:39:00,965 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=551810.0, ans=0.125
2023-05-10 15:39:06,049 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.52 vs. limit=15.0
2023-05-10 15:39:11,658 WARNING [train.py:1182] (0/2) Exclude cut with ID 2929-85685-0062-62366-0 from training. Duration: 20.26
2023-05-10 15:39:25,711 INFO [optim.py:478] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.426e+02 2.971e+02 3.292e+02 3.644e+02 6.816e+02, threshold=6.584e+02, percent-clipped=0.0
2023-05-10 15:39:27,290 INFO [train.py:1021] (0/2) Epoch 31, batch 1300, loss[loss=0.1568, simple_loss=0.2454, pruned_loss=0.03406, over 36881.00 frames. ], tot_loss[loss=0.1653, simple_loss=0.2585, pruned_loss=0.03605, over 7169995.12 frames. ], batch size: 96, lr: 3.55e-03, grad_scale: 16.0
2023-05-10 15:39:28,872 WARNING [train.py:1182] (0/2) Exclude cut with ID 5239-32139-0030-9324-0_sp0.9 from training. Duration: 21.3444375
2023-05-10 15:39:34,094 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=551910.0, ans=0.2
2023-05-10 15:39:35,731 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.4875, 5.3143, 4.6244, 5.0942], device='cuda:0')
2023-05-10 15:39:37,521 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.13 vs. limit=22.5
2023-05-10 15:39:40,677 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.75 vs. limit=15.0
2023-05-10 15:39:46,246 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=551960.0, ans=0.1
2023-05-10 15:39:55,386 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=551960.0, ans=0.125
2023-05-10 15:40:04,869 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.9692, 4.2970, 4.4992, 4.2160], device='cuda:0')
2023-05-10 15:40:09,937 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.64 vs. limit=15.0
2023-05-10 15:40:31,146 WARNING [train.py:1182] (0/2) Exclude cut with ID 497-129325-0061-62254-0_sp1.1 from training. Duration: 0.97725
2023-05-10 15:40:44,793 INFO [train.py:1021] (0/2) Epoch 31, batch 1350, loss[loss=0.1752, simple_loss=0.2678, pruned_loss=0.04135, over 37026.00 frames. ], tot_loss[loss=0.1651, simple_loss=0.2584, pruned_loss=0.03592, over 7206096.25 frames. ], batch size: 116, lr: 3.54e-03, grad_scale: 16.0
2023-05-10 15:41:14,890 WARNING [train.py:1182] (0/2) Exclude cut with ID 7395-89880-0031-39906-0_sp0.9 from training. Duration: 22.97225
2023-05-10 15:41:25,363 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=552260.0, ans=0.125
2023-05-10 15:41:49,363 WARNING [train.py:1182] (0/2) Exclude cut with ID 7395-89880-0047-39922-0_sp0.9 from training. Duration: 21.97775
2023-05-10 15:41:58,645 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=552360.0, ans=0.04949747468305833
2023-05-10 15:41:59,634 INFO [optim.py:478] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.485e+02 2.971e+02 3.534e+02 4.225e+02 6.633e+02, threshold=7.069e+02, percent-clipped=1.0
2023-05-10 15:42:01,135 INFO [train.py:1021] (0/2) Epoch 31, batch 1400, loss[loss=0.1515, simple_loss=0.2373, pruned_loss=0.03288, over 37089.00 frames. ], tot_loss[loss=0.1653, simple_loss=0.2586, pruned_loss=0.03603, over 7208270.21 frames. ], batch size: 88, lr: 3.54e-03, grad_scale: 16.0
2023-05-10 15:42:01,247 WARNING [train.py:1182] (0/2) Exclude cut with ID 1112-1043-0006-89194-0_sp0.9 from training. Duration: 21.8333125
2023-05-10 15:42:10,583 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.0409, 4.3550, 3.1646, 2.9650], device='cuda:0')
2023-05-10 15:42:11,785 WARNING [train.py:1182] (0/2) Exclude cut with ID 1914-133440-0031-94921-0 from training. Duration: 20.47
2023-05-10 15:42:20,767 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=552460.0, ans=0.125
2023-05-10 15:42:25,637 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.46 vs. limit=22.5
2023-05-10 15:42:27,478 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.73 vs. limit=10.0
2023-05-10 15:42:58,823 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.03 vs. limit=15.0
2023-05-10 15:42:59,844 INFO [scaling.py:1059] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00
2023-05-10 15:43:18,780 INFO [train.py:1021] (0/2) Epoch 31, batch 1450, loss[loss=0.1549, simple_loss=0.2449, pruned_loss=0.03246, over 36964.00 frames. ], tot_loss[loss=0.1645, simple_loss=0.2575, pruned_loss=0.03577, over 7227871.77 frames. ], batch size: 95, lr: 3.54e-03, grad_scale: 8.0
2023-05-10 15:43:23,357 WARNING [train.py:1182] (0/2) Exclude cut with ID 7395-89880-0037-39912-0_sp0.9 from training. Duration: 20.67225
2023-05-10 15:43:31,266 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=552660.0, ans=0.125
2023-05-10 15:43:43,243 WARNING [train.py:1182] (0/2) Exclude cut with ID 1914-133440-0024-94914-0_sp0.9 from training. Duration: 25.2444375
2023-05-10 15:43:43,527 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=552710.0, ans=0.025
2023-05-10 15:44:09,761 WARNING [train.py:1182] (0/2) Exclude cut with ID 3340-169293-0021-76797-0_sp0.9 from training. Duration: 21.1445
2023-05-10 15:44:26,970 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.09 vs. limit=15.0
2023-05-10 15:44:29,617 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=552860.0, ans=0.125
2023-05-10 15:44:35,229 INFO [optim.py:478] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.433e+02 3.008e+02 3.523e+02 4.786e+02 7.344e+02, threshold=7.045e+02, percent-clipped=1.0
2023-05-10 15:44:35,262 INFO [train.py:1021] (0/2) Epoch 31, batch 1500, loss[loss=0.1673, simple_loss=0.267, pruned_loss=0.03384, over 34790.00 frames. ], tot_loss[loss=0.1651, simple_loss=0.2582, pruned_loss=0.03601, over 7216323.89 frames. ], batch size: 145, lr: 3.54e-03, grad_scale: 8.0
2023-05-10 15:44:40,047 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=552910.0, ans=0.125
2023-05-10 15:44:44,627 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=552910.0, ans=0.0
2023-05-10 15:45:29,290 WARNING [train.py:1182] (0/2) Exclude cut with ID 2929-85685-0079-62383-0_sp0.9 from training. Duration: 33.038875
2023-05-10 15:45:41,401 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=553110.0, ans=0.025
2023-05-10 15:45:52,066 INFO [train.py:1021] (0/2) Epoch 31, batch 1550, loss[loss=0.1789, simple_loss=0.278, pruned_loss=0.03983, over 35953.00 frames. ], tot_loss[loss=0.1646, simple_loss=0.2575, pruned_loss=0.03587, over 7237055.79 frames. ], batch size: 133, lr: 3.54e-03, grad_scale: 8.0
2023-05-10 15:45:54,018 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=553160.0, ans=0.125
2023-05-10 15:46:09,196 WARNING [train.py:1182] (0/2) Exclude cut with ID 6426-64291-0000-16059-0_sp0.9 from training. Duration: 20.0944375
2023-05-10 15:46:21,573 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=553260.0, ans=0.125
2023-05-10 15:46:25,739 WARNING [train.py:1182] (0/2) Exclude cut with ID 2929-85685-0044-62348-0_sp1.1 from training. Duration: 20.4
2023-05-10 15:46:26,410 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.66 vs. limit=22.5
2023-05-10 15:46:33,151 WARNING [train.py:1182] (0/2) Exclude cut with ID 6330-62851-0022-91297-0 from training. Duration: 20.085
2023-05-10 15:46:43,712 WARNING [train.py:1182] (0/2) Exclude cut with ID 4860-13185-0032-76709-0_sp0.9 from training. Duration: 23.07775
2023-05-10 15:46:46,742 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.39 vs. limit=15.0
2023-05-10 15:46:53,845 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.4893, 3.4811, 3.8103, 3.3293], device='cuda:0')
2023-05-10 15:46:58,909 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=553360.0, ans=0.1
2023-05-10 15:47:09,366 INFO [optim.py:478] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.542e+02 2.986e+02 3.648e+02 4.500e+02 6.936e+02, threshold=7.296e+02, percent-clipped=0.0
2023-05-10 15:47:09,410 INFO [train.py:1021] (0/2) Epoch 31, batch 1600, loss[loss=0.1573, simple_loss=0.242, pruned_loss=0.03626, over 36794.00 frames. ], tot_loss[loss=0.1646, simple_loss=0.2573, pruned_loss=0.03595, over 7248220.42 frames. ], batch size: 89, lr: 3.54e-03, grad_scale: 16.0
2023-05-10 15:47:23,411 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=553460.0, ans=0.0
2023-05-10 15:47:35,332 WARNING [train.py:1182] (0/2) Exclude cut with ID 2929-85685-0044-62348-0_sp0.9 from training. Duration: 24.9333125
2023-05-10 15:47:42,575 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=8.03 vs. limit=22.5
2023-05-10 15:47:45,258 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=553510.0, ans=0.125
2023-05-10 15:48:01,482 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.40 vs. limit=15.0
2023-05-10 15:48:14,327 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.1977, 3.8373, 3.5455, 3.8523, 3.1588, 2.8827, 3.2857, 2.8252],
device='cuda:0')
2023-05-10 15:48:20,076 WARNING [train.py:1182] (0/2) Exclude cut with ID 5118-111612-0016-124680-0_sp0.9 from training. Duration: 20.388875
2023-05-10 15:48:26,073 INFO [train.py:1021] (0/2) Epoch 31, batch 1650, loss[loss=0.1796, simple_loss=0.2776, pruned_loss=0.04082, over 34738.00 frames. ], tot_loss[loss=0.1647, simple_loss=0.2575, pruned_loss=0.03597, over 7230385.98 frames. ], batch size: 145, lr: 3.54e-03, grad_scale: 16.0
2023-05-10 15:48:26,166 WARNING [train.py:1182] (0/2) Exclude cut with ID 432-122774-0017-62487-0_sp1.1 from training. Duration: 20.3590625
2023-05-10 15:48:26,454 INFO [scaling.py:1059] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00
2023-05-10 15:49:02,109 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=553760.0, ans=0.1
2023-05-10 15:49:11,287 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=553810.0, ans=0.125
2023-05-10 15:49:14,195 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=553810.0, ans=0.0
2023-05-10 15:49:26,045 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=553860.0, ans=0.125
2023-05-10 15:49:37,083 WARNING [train.py:1182] (0/2) Exclude cut with ID 3557-8342-0013-54691-0_sp1.1 from training. Duration: 0.836375
2023-05-10 15:49:43,446 INFO [optim.py:478] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.604e+02 3.201e+02 3.994e+02 5.631e+02 7.886e+02, threshold=7.988e+02, percent-clipped=6.0
2023-05-10 15:49:43,477 INFO [train.py:1021] (0/2) Epoch 31, batch 1700, loss[loss=0.151, simple_loss=0.2341, pruned_loss=0.03399, over 35329.00 frames. ], tot_loss[loss=0.1655, simple_loss=0.2581, pruned_loss=0.03641, over 7208048.12 frames. ], batch size: 78, lr: 3.54e-03, grad_scale: 16.0
2023-05-10 15:49:48,134 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=553910.0, ans=0.2
2023-05-10 15:49:56,080 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.15 vs. limit=15.0
2023-05-10 15:50:03,906 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.48 vs. limit=15.0
2023-05-10 15:50:08,157 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.39 vs. limit=15.0
2023-05-10 15:50:09,575 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=553960.0, ans=0.125
2023-05-10 15:50:09,709 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=553960.0, ans=0.125
2023-05-10 15:50:20,307 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=554010.0, ans=0.1
2023-05-10 15:50:20,352 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=554010.0, ans=0.0
2023-05-10 15:50:23,595 WARNING [train.py:1182] (0/2) Exclude cut with ID 8565-290391-0049-67394-0_sp0.9 from training. Duration: 21.3166875
2023-05-10 15:50:26,920 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=554010.0, ans=0.1
2023-05-10 15:50:55,915 WARNING [train.py:1182] (0/2) Exclude cut with ID 6533-399-0029-104863-0_sp0.9 from training. Duration: 22.1055625
2023-05-10 15:51:00,274 INFO [train.py:1021] (0/2) Epoch 31, batch 1750, loss[loss=0.1608, simple_loss=0.2484, pruned_loss=0.03661, over 36872.00 frames. ], tot_loss[loss=0.1666, simple_loss=0.2586, pruned_loss=0.0373, over 7224619.52 frames. ], batch size: 96, lr: 3.54e-03, grad_scale: 16.0
2023-05-10 15:51:02,119 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=554160.0, ans=0.125
2023-05-10 15:51:07,898 WARNING [train.py:1182] (0/2) Exclude cut with ID 7699-105389-0094-26379-0_sp1.1 from training. Duration: 21.77725
2023-05-10 15:51:25,539 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.7133, 4.9713, 5.2128, 4.8753], device='cuda:0')
2023-05-10 15:51:27,132 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=554210.0, ans=0.2
2023-05-10 15:51:29,900 WARNING [train.py:1182] (0/2) Exclude cut with ID 7255-291500-0005-134304-0_sp0.9 from training. Duration: 27.8166875
2023-05-10 15:51:31,749 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=554260.0, ans=0.125
2023-05-10 15:51:52,996 WARNING [train.py:1182] (0/2) Exclude cut with ID 7357-94126-0021-15852-0_sp1.1 from training. Duration: 22.5090625
2023-05-10 15:52:00,577 WARNING [train.py:1182] (0/2) Exclude cut with ID 7255-291500-0005-134304-0 from training. Duration: 25.035
2023-05-10 15:52:03,886 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=554360.0, ans=0.0
2023-05-10 15:52:09,421 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.65 vs. limit=10.0
2023-05-10 15:52:17,446 INFO [optim.py:478] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.880e+02 3.405e+02 3.760e+02 4.228e+02 6.080e+02, threshold=7.520e+02, percent-clipped=0.0
2023-05-10 15:52:17,478 INFO [train.py:1021] (0/2) Epoch 31, batch 1800, loss[loss=0.177, simple_loss=0.2695, pruned_loss=0.04222, over 37174.00 frames. ], tot_loss[loss=0.1682, simple_loss=0.2594, pruned_loss=0.03848, over 7199208.83 frames. ], batch size: 112, lr: 3.54e-03, grad_scale: 16.0
2023-05-10 15:52:19,092 WARNING [train.py:1182] (0/2) Exclude cut with ID 774-127930-0014-10412-0_sp1.1 from training. Duration: 0.95
2023-05-10 15:52:37,501 WARNING [train.py:1182] (0/2) Exclude cut with ID 3033-130750-0096-55598-0_sp0.9 from training. Duration: 0.92225
2023-05-10 15:52:48,427 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.3479, 4.5313, 2.1589, 2.4944], device='cuda:0')
2023-05-10 15:53:05,295 WARNING [train.py:1182] (0/2) Exclude cut with ID 4511-76322-0006-80011-0 from training. Duration: 21.97
2023-05-10 15:53:16,460 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.33 vs. limit=15.0
2023-05-10 15:53:25,350 WARNING [train.py:1182] (0/2) Exclude cut with ID 7492-105653-0055-62765-0_sp0.9 from training. Duration: 21.97225
2023-05-10 15:53:26,795 WARNING [train.py:1182] (0/2) Exclude cut with ID 453-131332-0000-47844-0_sp0.9 from training. Duration: 25.3333125
2023-05-10 15:53:34,209 INFO [train.py:1021] (0/2) Epoch 31, batch 1850, loss[loss=0.1756, simple_loss=0.2689, pruned_loss=0.04118, over 37046.00 frames. ], tot_loss[loss=0.17, simple_loss=0.2606, pruned_loss=0.03975, over 7205749.25 frames. ], batch size: 110, lr: 3.54e-03, grad_scale: 16.0
2023-05-10 15:53:35,835 WARNING [train.py:1182] (0/2) Exclude cut with ID 5172-29468-0015-19128-0_sp0.9 from training. Duration: 21.5055625
2023-05-10 15:53:46,127 WARNING [train.py:1182] (0/2) Exclude cut with ID 453-131332-0000-47844-0_sp1.1 from training. Duration: 20.72725
2023-05-10 15:53:47,921 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=554710.0, ans=0.0
2023-05-10 15:53:52,393 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=554710.0, ans=0.125
2023-05-10 15:53:54,548 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=554710.0, ans=0.125
2023-05-10 15:53:58,875 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=554710.0, ans=0.125
2023-05-10 15:54:23,438 WARNING [train.py:1182] (0/2) Exclude cut with ID 8631-249866-0030-130156-0_sp0.9 from training. Duration: 26.32775
2023-05-10 15:54:23,765 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=554810.0, ans=0.125
2023-05-10 15:54:27,270 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.30 vs. limit=15.0
2023-05-10 15:54:38,764 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=554860.0, ans=0.1
2023-05-10 15:54:51,171 INFO [optim.py:478] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.812e+02 3.440e+02 3.852e+02 4.214e+02 5.941e+02, threshold=7.704e+02, percent-clipped=0.0
2023-05-10 15:54:51,203 INFO [train.py:1021] (0/2) Epoch 31, batch 1900, loss[loss=0.1643, simple_loss=0.2476, pruned_loss=0.04054, over 37072.00 frames. ], tot_loss[loss=0.1704, simple_loss=0.2602, pruned_loss=0.04031, over 7210544.66 frames. ], batch size: 94, lr: 3.54e-03, grad_scale: 16.0
2023-05-10 15:54:55,588 WARNING [train.py:1182] (0/2) Exclude cut with ID 3867-173237-0077-144769-0 from training. Duration: 20.025
2023-05-10 15:55:01,499 WARNING [train.py:1182] (0/2) Exclude cut with ID 6709-74022-0004-86860-0_sp1.1 from training. Duration: 0.9409375
2023-05-10 15:55:01,509 WARNING [train.py:1182] (0/2) Exclude cut with ID 4757-1811-0023-62229-0_sp0.9 from training. Duration: 21.37775
2023-05-10 15:55:21,861 WARNING [train.py:1182] (0/2) Exclude cut with ID 1250-135782-0004-25974-0_sp0.9 from training. Duration: 21.17225
2023-05-10 15:55:21,883 WARNING [train.py:1182] (0/2) Exclude cut with ID 7357-94126-0021-15852-0_sp0.9 from training. Duration: 27.511125
2023-05-10 15:55:51,362 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=555110.0, ans=0.125
2023-05-10 15:55:52,856 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=555110.0, ans=0.125
2023-05-10 15:55:56,994 WARNING [train.py:1182] (0/2) Exclude cut with ID 453-131332-0000-47844-0 from training. Duration: 22.8
2023-05-10 15:56:01,556 WARNING [train.py:1182] (0/2) Exclude cut with ID 4964-30587-0040-44509-0 from training. Duration: 22.585
2023-05-10 15:56:08,043 INFO [train.py:1021] (0/2) Epoch 31, batch 1950, loss[loss=0.1911, simple_loss=0.2804, pruned_loss=0.05094, over 32509.00 frames. ], tot_loss[loss=0.1718, simple_loss=0.261, pruned_loss=0.04126, over 7180604.04 frames. ], batch size: 170, lr: 3.53e-03, grad_scale: 16.0
2023-05-10 15:56:32,190 WARNING [train.py:1182] (0/2) Exclude cut with ID 6978-92210-0001-146967-0_sp0.9 from training. Duration: 22.0166875
2023-05-10 15:56:32,468 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=555210.0, ans=0.125
2023-05-10 15:56:48,077 WARNING [train.py:1182] (0/2) Exclude cut with ID 7255-291500-0003-134302-0_sp1.1 from training. Duration: 24.395375
2023-05-10 15:56:54,142 WARNING [train.py:1182] (0/2) Exclude cut with ID 3488-85273-0017-41203-0_sp0.9 from training. Duration: 27.47775
2023-05-10 15:56:56,555 INFO [scaling.py:1059] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00
2023-05-10 15:56:59,136 WARNING [train.py:1182] (0/2) Exclude cut with ID 432-122774-0017-62487-0_sp0.9 from training. Duration: 24.8833125
2023-05-10 15:57:02,237 WARNING [train.py:1182] (0/2) Exclude cut with ID 6758-72288-0033-108368-0 from training. Duration: 23.39
2023-05-10 15:57:09,655 WARNING [train.py:1182] (0/2) Exclude cut with ID 8291-282929-0007-12994-0_sp0.9 from training. Duration: 28.72225
2023-05-10 15:57:09,956 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=555360.0, ans=0.1
2023-05-10 15:57:11,957 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.32 vs. limit=6.0
2023-05-10 15:57:12,990 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=555360.0, ans=0.125
2023-05-10 15:57:18,576 WARNING [train.py:1182] (0/2) Exclude cut with ID 585-294811-0110-133686-0_sp0.9 from training. Duration: 20.8944375
2023-05-10 15:57:25,050 INFO [optim.py:478] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.744e+02 3.555e+02 4.043e+02 4.521e+02 6.056e+02, threshold=8.087e+02, percent-clipped=0.0
2023-05-10 15:57:25,083 INFO [train.py:1021] (0/2) Epoch 31, batch 2000, loss[loss=0.1558, simple_loss=0.2398, pruned_loss=0.03589, over 37183.00 frames. ], tot_loss[loss=0.1723, simple_loss=0.2608, pruned_loss=0.04193, over 7149230.20 frames. ], batch size: 93, lr: 3.53e-03, grad_scale: 32.0
2023-05-10 15:57:28,356 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=555410.0, ans=0.125
2023-05-10 15:57:35,711 WARNING [train.py:1182] (0/2) Exclude cut with ID 5796-66357-0007-116447-0_sp0.9 from training. Duration: 23.8444375
2023-05-10 15:57:37,521 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=555410.0, ans=0.2
2023-05-10 15:57:43,643 INFO [scaling.py:1059] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00
2023-05-10 15:57:46,573 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=555460.0, ans=0.0
2023-05-10 15:57:50,836 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=555460.0, ans=0.0
2023-05-10 15:57:58,641 WARNING [train.py:1182] (0/2) Exclude cut with ID 8291-282929-0007-12994-0 from training. Duration: 25.85
2023-05-10 15:58:00,124 WARNING [train.py:1182] (0/2) Exclude cut with ID 8291-282929-0023-13010-0 from training. Duration: 21.39
2023-05-10 15:58:10,597 WARNING [train.py:1182] (0/2) Exclude cut with ID 7357-94126-0014-15845-0 from training. Duration: 27.92
2023-05-10 15:58:36,013 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.34 vs. limit=22.5
2023-05-10 15:58:38,367 WARNING [train.py:1182] (0/2) Exclude cut with ID 8631-249866-0039-130165-0_sp0.9 from training. Duration: 20.661125
2023-05-10 15:58:41,238 INFO [train.py:1021] (0/2) Epoch 31, batch 2050, loss[loss=0.1585, simple_loss=0.239, pruned_loss=0.03897, over 36975.00 frames. ], tot_loss[loss=0.1727, simple_loss=0.2607, pruned_loss=0.04231, over 7150769.88 frames. ], batch size: 91, lr: 3.53e-03, grad_scale: 32.0
2023-05-10 15:58:46,826 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=555660.0, ans=0.0
2023-05-10 15:58:57,360 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=555710.0, ans=0.0
2023-05-10 15:59:02,956 WARNING [train.py:1182] (0/2) Exclude cut with ID 7357-94126-0043-15874-0_sp0.9 from training. Duration: 20.07225
2023-05-10 15:59:09,577 WARNING [train.py:1182] (0/2) Exclude cut with ID 1085-156170-0017-128270-0 from training. Duration: 21.01
2023-05-10 15:59:20,465 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=555760.0, ans=0.1
2023-05-10 15:59:31,205 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=555810.0, ans=15.0
2023-05-10 15:59:32,495 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=555810.0, ans=0.125
2023-05-10 15:59:55,636 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=555860.0, ans=0.0
2023-05-10 15:59:58,074 INFO [optim.py:478] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.786e+02 3.584e+02 4.318e+02 5.188e+02 8.901e+02, threshold=8.636e+02, percent-clipped=3.0
2023-05-10 15:59:58,106 INFO [train.py:1021] (0/2) Epoch 31, batch 2100, loss[loss=0.1685, simple_loss=0.2524, pruned_loss=0.04232, over 37077.00 frames. ], tot_loss[loss=0.1731, simple_loss=0.2607, pruned_loss=0.04276, over 7136577.29 frames. ], batch size: 94, lr: 3.53e-03, grad_scale: 32.0
2023-05-10 15:59:58,482 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=555910.0, ans=0.125
2023-05-10 16:00:01,427 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=555910.0, ans=0.1
2023-05-10 16:00:10,921 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=555910.0, ans=0.125
2023-05-10 16:00:13,777 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=555960.0, ans=0.95
2023-05-10 16:00:22,753 WARNING [train.py:1182] (0/2) Exclude cut with ID 2195-150901-0045-59933-0 from training. Duration: 20.65
2023-05-10 16:00:24,659 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=555960.0, ans=0.125
2023-05-10 16:00:32,196 WARNING [train.py:1182] (0/2) Exclude cut with ID 5796-66357-0007-116447-0 from training. Duration: 21.46
2023-05-10 16:01:14,496 INFO [train.py:1021] (0/2) Epoch 31, batch 2150, loss[loss=0.1503, simple_loss=0.2334, pruned_loss=0.03358, over 36926.00 frames. ], tot_loss[loss=0.1726, simple_loss=0.26, pruned_loss=0.04261, over 7144024.46 frames. ], batch size: 91, lr: 3.53e-03, grad_scale: 32.0
2023-05-10 16:01:20,593 WARNING [train.py:1182] (0/2) Exclude cut with ID 3557-8342-0013-54691-0 from training. Duration: 0.92
2023-05-10 16:01:25,527 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0
2023-05-10 16:01:28,567 WARNING [train.py:1182] (0/2) Exclude cut with ID 8291-282929-0023-13010-0_sp0.9 from training. Duration: 23.7666875
2023-05-10 16:02:02,423 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=556310.0, ans=0.0
2023-05-10 16:02:06,717 WARNING [train.py:1182] (0/2) Exclude cut with ID 8544-281189-0060-101339-0_sp0.9 from training. Duration: 20.861125
2023-05-10 16:02:07,030 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=556310.0, ans=0.125
2023-05-10 16:02:17,185 WARNING [train.py:1182] (0/2) Exclude cut with ID 3488-65654-0031-41259-0_sp0.9 from training. Duration: 22.711125
2023-05-10 16:02:25,802 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.98 vs. limit=15.0
2023-05-10 16:02:31,087 INFO [optim.py:478] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.802e+02 3.512e+02 3.989e+02 4.976e+02 6.402e+02, threshold=7.979e+02, percent-clipped=0.0
2023-05-10 16:02:31,119 INFO [train.py:1021] (0/2) Epoch 31, batch 2200, loss[loss=0.1648, simple_loss=0.25, pruned_loss=0.03976, over 37049.00 frames. ], tot_loss[loss=0.1728, simple_loss=0.2601, pruned_loss=0.04275, over 7172006.46 frames. ], batch size: 99, lr: 3.53e-03, grad_scale: 32.0
2023-05-10 16:02:44,386 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.23 vs. limit=15.0
2023-05-10 16:03:01,633 WARNING [train.py:1182] (0/2) Exclude cut with ID 6951-79737-0043-132310-0_sp1.1 from training. Duration: 22.986375
2023-05-10 16:03:06,343 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=556510.0, ans=0.125
2023-05-10 16:03:07,286 INFO [scaling.py:969] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.69 vs. limit=5.0
2023-05-10 16:03:20,304 WARNING [train.py:1182] (0/2) Exclude cut with ID 8040-260924-0003-80960-0_sp0.9 from training. Duration: 22.07225
2023-05-10 16:03:24,782 WARNING [train.py:1182] (0/2) Exclude cut with ID 7699-105389-0045-26330-0_sp0.9 from training. Duration: 20.3055625
2023-05-10 16:03:26,288 WARNING [train.py:1182] (0/2) Exclude cut with ID 6356-271890-0060-94317-0_sp0.9 from training. Duration: 20.72225
2023-05-10 16:03:35,567 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=556610.0, ans=0.025
2023-05-10 16:03:36,933 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([6.0032, 5.4071, 5.2268, 5.8049], device='cuda:0')
2023-05-10 16:03:37,055 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=556610.0, ans=0.125
2023-05-10 16:03:42,261 INFO [scaling.py:1059] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00
2023-05-10 16:03:47,956 INFO [train.py:1021] (0/2) Epoch 31, batch 2250, loss[loss=0.1734, simple_loss=0.2641, pruned_loss=0.04132, over 36706.00 frames. ], tot_loss[loss=0.1731, simple_loss=0.2603, pruned_loss=0.04295, over 7183692.59 frames. ], batch size: 122, lr: 3.53e-03, grad_scale: 32.0
2023-05-10 16:03:48,039 WARNING [train.py:1182] (0/2) Exclude cut with ID 3488-85273-0017-41203-0_sp1.1 from training. Duration: 22.4818125
2023-05-10 16:04:10,873 WARNING [train.py:1182] (0/2) Exclude cut with ID 4964-30587-0040-44509-0_sp0.9 from training. Duration: 25.0944375
2023-05-10 16:04:13,805 WARNING [train.py:1182] (0/2) Exclude cut with ID 6533-399-0047-104881-0 from training. Duration: 21.515
2023-05-10 16:04:21,375 WARNING [train.py:1182] (0/2) Exclude cut with ID 7357-94126-0009-15840-0_sp0.9 from training. Duration: 27.02225
2023-05-10 16:04:21,618 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=556760.0, ans=0.125
2023-05-10 16:04:27,418 WARNING [train.py:1182] (0/2) Exclude cut with ID 432-122774-0010-62480-0_sp0.9 from training. Duration: 22.22225
2023-05-10 16:04:32,689 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=556810.0, ans=0.0
2023-05-10 16:04:35,554 WARNING [train.py:1182] (0/2) Exclude cut with ID 4964-30587-0085-44554-0_sp0.9 from training. Duration: 20.85
2023-05-10 16:04:35,824 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=556810.0, ans=0.125
2023-05-10 16:04:38,814 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=556810.0, ans=0.125
2023-05-10 16:04:55,241 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=556860.0, ans=0.0
2023-05-10 16:04:55,364 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=556860.0, ans=0.0
2023-05-10 16:05:04,684 INFO [optim.py:478] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.870e+02 3.533e+02 3.914e+02 4.507e+02 7.182e+02, threshold=7.827e+02, percent-clipped=0.0
2023-05-10 16:05:04,715 INFO [train.py:1021] (0/2) Epoch 31, batch 2300, loss[loss=0.1641, simple_loss=0.2526, pruned_loss=0.03776, over 37153.00 frames. ], tot_loss[loss=0.1734, simple_loss=0.2604, pruned_loss=0.04317, over 7156801.58 frames. ], batch size: 102, lr: 3.53e-03, grad_scale: 32.0
2023-05-10 16:05:07,692 WARNING [train.py:1182] (0/2) Exclude cut with ID 4295-39940-0007-92567-0 from training. Duration: 21.54
2023-05-10 16:05:13,567 WARNING [train.py:1182] (0/2) Exclude cut with ID 4964-30587-0040-44509-0_sp1.1 from training. Duration: 20.5318125
2023-05-10 16:05:15,343 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=556910.0, ans=0.0
2023-05-10 16:05:18,247 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=556960.0, ans=0.1
2023-05-10 16:05:24,706 WARNING [train.py:1182] (0/2) Exclude cut with ID 7255-291500-0012-134311-0_sp0.9 from training. Duration: 21.9333125
2023-05-10 16:05:40,268 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=557010.0, ans=0.125
2023-05-10 16:05:46,390 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=557010.0, ans=0.125
2023-05-10 16:05:50,645 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=557060.0, ans=0.2
2023-05-10 16:06:03,504 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=557060.0, ans=0.2
2023-05-10 16:06:12,439 WARNING [train.py:1182] (0/2) Exclude cut with ID 8631-249866-0025-130151-0_sp0.9 from training. Duration: 21.7944375
2023-05-10 16:06:21,998 INFO [train.py:1021] (0/2) Epoch 31, batch 2350, loss[loss=0.1606, simple_loss=0.2406, pruned_loss=0.04028, over 37055.00 frames. ], tot_loss[loss=0.1733, simple_loss=0.2602, pruned_loss=0.04318, over 7121551.80 frames. ], batch size: 88, lr: 3.53e-03, grad_scale: 32.0
2023-05-10 16:06:23,932 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=557160.0, ans=0.0
2023-05-10 16:06:26,548 WARNING [train.py:1182] (0/2) Exclude cut with ID 8291-282929-0002-12989-0_sp0.9 from training. Duration: 22.4666875
2023-05-10 16:06:26,750 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=557160.0, ans=0.125
2023-05-10 16:06:34,214 WARNING [train.py:1182] (0/2) Exclude cut with ID 6121-9014-0076-24124-0 from training. Duration: 21.635
2023-05-10 16:06:40,101 WARNING [train.py:1182] (0/2) Exclude cut with ID 6121-9014-0076-24124-0_sp0.9 from training. Duration: 24.038875
2023-05-10 16:07:26,359 WARNING [train.py:1182] (0/2) Exclude cut with ID 7255-291500-0009-134308-0_sp1.1 from training. Duration: 21.786375
2023-05-10 16:07:38,108 INFO [optim.py:478] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.873e+02 3.512e+02 3.763e+02 4.233e+02 6.275e+02, threshold=7.527e+02, percent-clipped=0.0
2023-05-10 16:07:38,169 INFO [train.py:1021] (0/2) Epoch 31, batch 2400, loss[loss=0.1502, simple_loss=0.2341, pruned_loss=0.03314, over 36938.00 frames. ], tot_loss[loss=0.1728, simple_loss=0.2597, pruned_loss=0.04295, over 7124553.68 frames. ], batch size: 91, lr: 3.53e-03, grad_scale: 32.0
2023-05-10 16:07:38,273 WARNING [train.py:1182] (0/2) Exclude cut with ID 8291-282929-0002-12989-0 from training. Duration: 20.22
2023-05-10 16:07:45,011 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=557410.0, ans=0.1
2023-05-10 16:07:47,995 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=557410.0, ans=0.125
2023-05-10 16:08:16,379 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.00 vs. limit=15.0
2023-05-10 16:08:20,545 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=557510.0, ans=0.09899494936611666
2023-05-10 16:08:46,530 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=557610.0, ans=0.09899494936611666
2023-05-10 16:08:49,579 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.5443, 4.8514, 5.0249, 4.7270], device='cuda:0')
2023-05-10 16:08:55,196 INFO [train.py:1021] (0/2) Epoch 31, batch 2450, loss[loss=0.1714, simple_loss=0.2658, pruned_loss=0.03856, over 37153.00 frames. ], tot_loss[loss=0.1732, simple_loss=0.2601, pruned_loss=0.04315, over 7123872.16 frames. ], batch size: 112, lr: 3.53e-03, grad_scale: 32.0
2023-05-10 16:08:55,485 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=557660.0, ans=0.125
2023-05-10 16:09:18,804 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=557710.0, ans=0.125
2023-05-10 16:09:19,074 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.22 vs. limit=12.0
2023-05-10 16:09:40,346 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=557810.0, ans=0.0
2023-05-10 16:09:44,508 WARNING [train.py:1182] (0/2) Exclude cut with ID 6951-79737-0043-132310-0 from training. Duration: 25.285
2023-05-10 16:09:44,870 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([1.8480, 2.9642, 4.6076, 3.1181], device='cuda:0')
2023-05-10 16:09:44,878 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=557810.0, ans=0.5
2023-05-10 16:10:11,889 INFO [optim.py:478] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.849e+02 3.602e+02 4.214e+02 4.879e+02 7.729e+02, threshold=8.429e+02, percent-clipped=1.0
2023-05-10 16:10:11,921 INFO [train.py:1021] (0/2) Epoch 31, batch 2500, loss[loss=0.1724, simple_loss=0.2631, pruned_loss=0.0408, over 37188.00 frames. ], tot_loss[loss=0.1737, simple_loss=0.2608, pruned_loss=0.0433, over 7108519.94 frames. ], batch size: 102, lr: 3.53e-03, grad_scale: 32.0
2023-05-10 16:10:21,715 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.20 vs. limit=15.0
2023-05-10 16:10:54,058 WARNING [train.py:1182] (0/2) Exclude cut with ID 811-130148-0001-63453-0_sp0.9 from training. Duration: 20.861125
2023-05-10 16:11:15,262 WARNING [train.py:1182] (0/2) Exclude cut with ID 6010-56788-0055-90261-0 from training. Duration: 20.88
2023-05-10 16:11:19,089 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=558110.0, ans=0.0
2023-05-10 16:11:29,420 INFO [train.py:1021] (0/2) Epoch 31, batch 2550, loss[loss=0.1593, simple_loss=0.2382, pruned_loss=0.04017, over 36777.00 frames. ], tot_loss[loss=0.1738, simple_loss=0.2611, pruned_loss=0.04328, over 7099415.29 frames. ], batch size: 89, lr: 3.53e-03, grad_scale: 32.0
2023-05-10 16:11:45,238 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=558210.0, ans=0.125
2023-05-10 16:11:46,354 WARNING [train.py:1182] (0/2) Exclude cut with ID 7357-94126-0045-15876-0_sp0.9 from training. Duration: 23.4166875
2023-05-10 16:11:47,887 INFO [scaling.py:1059] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00
2023-05-10 16:12:15,215 INFO [scaling.py:1059] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00
2023-05-10 16:12:16,853 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=558310.0, ans=0.09899494936611666
2023-05-10 16:12:45,145 INFO [optim.py:478] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.702e+02 3.568e+02 3.932e+02 4.556e+02 7.220e+02, threshold=7.865e+02, percent-clipped=0.0
2023-05-10 16:12:45,176 INFO [train.py:1021] (0/2) Epoch 31, batch 2600, loss[loss=0.1763, simple_loss=0.2718, pruned_loss=0.04037, over 34507.00 frames. ], tot_loss[loss=0.1742, simple_loss=0.2611, pruned_loss=0.04361, over 7051270.31 frames. ], batch size: 144, lr: 3.52e-03, grad_scale: 32.0
2023-05-10 16:12:56,797 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.44 vs. limit=15.0
2023-05-10 16:13:00,897 WARNING [train.py:1182] (0/2) Exclude cut with ID 7255-291500-0006-134305-0 from training. Duration: 21.24
2023-05-10 16:13:00,911 WARNING [train.py:1182] (0/2) Exclude cut with ID 6533-399-0047-104881-0_sp0.9 from training. Duration: 23.9055625
2023-05-10 16:13:24,299 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=558510.0, ans=0.125
2023-05-10 16:13:27,544 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.27 vs. limit=15.0
2023-05-10 16:13:29,390 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.83 vs. limit=15.0
2023-05-10 16:13:36,071 WARNING [train.py:1182] (0/2) Exclude cut with ID 6758-72288-0033-108368-0_sp0.9 from training. Duration: 25.988875
2023-05-10 16:13:41,079 INFO [scaling.py:1059] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00
2023-05-10 16:13:43,765 WARNING [train.py:1182] (0/2) Exclude cut with ID 7255-291500-0001-134300-0_sp0.9 from training. Duration: 20.67225
2023-05-10 16:13:50,760 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=558610.0, ans=0.2
2023-05-10 16:13:59,984 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.22 vs. limit=15.0
2023-05-10 16:14:02,161 INFO [train.py:1021] (0/2) Epoch 31, batch 2650, loss[loss=0.149, simple_loss=0.2288, pruned_loss=0.03463, over 36945.00 frames. ], tot_loss[loss=0.1729, simple_loss=0.2597, pruned_loss=0.04304, over 7064083.97 frames. ], batch size: 91, lr: 3.52e-03, grad_scale: 32.0
2023-05-10 16:14:34,330 WARNING [train.py:1182] (0/2) Exclude cut with ID 3488-85273-0038-41224-0 from training. Duration: 20.34
2023-05-10 16:14:51,267 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=558810.0, ans=0.125
2023-05-10 16:14:54,173 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=558810.0, ans=0.0
2023-05-10 16:14:54,264 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=558810.0, ans=0.125
2023-05-10 16:14:55,654 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=558810.0, ans=0.125
2023-05-10 16:15:00,583 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=558810.0, ans=0.1
2023-05-10 16:15:03,695 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=558860.0, ans=0.0
2023-05-10 16:15:18,089 INFO [optim.py:478] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.921e+02 3.403e+02 3.615e+02 4.137e+02 6.796e+02, threshold=7.230e+02, percent-clipped=0.0
2023-05-10 16:15:18,120 INFO [train.py:1021] (0/2) Epoch 31, batch 2700, loss[loss=0.2043, simple_loss=0.284, pruned_loss=0.0623, over 24340.00 frames. ], tot_loss[loss=0.1725, simple_loss=0.2594, pruned_loss=0.04281, over 7080528.59 frames. ], batch size: 236, lr: 3.52e-03, grad_scale: 32.0
2023-05-10 16:15:31,157 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.07 vs. limit=15.0
2023-05-10 16:15:45,268 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=558960.0, ans=0.125
2023-05-10 16:15:49,792 INFO [scaling.py:1059] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00
2023-05-10 16:15:54,100 WARNING [train.py:1182] (0/2) Exclude cut with ID 7357-94126-0026-15857-0_sp0.9 from training. Duration: 25.061125
2023-05-10 16:15:54,711 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.54 vs. limit=15.0
2023-05-10 16:16:05,220 WARNING [train.py:1182] (0/2) Exclude cut with ID 3033-130750-0096-55598-0 from training. Duration: 0.83
2023-05-10 16:16:13,006 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=559060.0, ans=0.0
2023-05-10 16:16:14,531 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=559060.0, ans=0.125
2023-05-10 16:16:22,282 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=559110.0, ans=0.0
2023-05-10 16:16:26,965 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=559110.0, ans=0.125
2023-05-10 16:16:33,245 WARNING [train.py:1182] (0/2) Exclude cut with ID 3488-85273-0017-41203-0 from training. Duration: 24.73
2023-05-10 16:16:36,188 INFO [train.py:1021] (0/2) Epoch 31, batch 2750, loss[loss=0.1691, simple_loss=0.258, pruned_loss=0.04012, over 37187.00 frames. ], tot_loss[loss=0.1725, simple_loss=0.2594, pruned_loss=0.04278, over 7076001.44 frames. ], batch size: 102, lr: 3.52e-03, grad_scale: 32.0
2023-05-10 16:16:45,205 WARNING [train.py:1182] (0/2) Exclude cut with ID 7255-291500-0009-134308-0 from training. Duration: 23.965
2023-05-10 16:16:56,154 WARNING [train.py:1182] (0/2) Exclude cut with ID 6978-92210-0030-146996-0_sp0.9 from training. Duration: 22.088875
2023-05-10 16:17:11,205 WARNING [train.py:1182] (0/2) Exclude cut with ID 7255-291500-0006-134305-0_sp0.9 from training. Duration: 23.6
2023-05-10 16:17:11,518 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.2038, 3.2047, 3.0168, 3.7191, 2.0998, 3.2434, 3.7701, 3.3226],
device='cuda:0')
2023-05-10 16:17:15,908 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=559260.0, ans=0.025
2023-05-10 16:17:34,127 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=559310.0, ans=0.1
2023-05-10 16:17:35,693 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=559360.0, ans=0.0
2023-05-10 16:17:42,464 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=559360.0, ans=0.04949747468305833
2023-05-10 16:17:45,497 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=559360.0, ans=0.1
2023-05-10 16:17:52,696 INFO [optim.py:478] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.949e+02 3.591e+02 4.328e+02 5.388e+02 7.598e+02, threshold=8.657e+02, percent-clipped=4.0
2023-05-10 16:17:52,728 INFO [train.py:1021] (0/2) Epoch 31, batch 2800, loss[loss=0.2061, simple_loss=0.2831, pruned_loss=0.06455, over 23517.00 frames. ], tot_loss[loss=0.1723, simple_loss=0.2594, pruned_loss=0.04257, over 7089019.42 frames. ], batch size: 234, lr: 3.52e-03, grad_scale: 32.0
2023-05-10 16:17:53,541 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=22.5
2023-05-10 16:17:57,821 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.27 vs. limit=15.0
2023-05-10 16:18:03,675 INFO [scaling.py:1059] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00
2023-05-10 16:18:11,230 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=559460.0, ans=0.1
2023-05-10 16:18:19,461 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.1317, 4.1263, 4.6202, 4.8372], device='cuda:0')
2023-05-10 16:18:23,862 INFO [scaling.py:1059] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00
2023-05-10 16:18:47,355 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1087, 4.2737, 4.6435, 4.6889], device='cuda:0')
2023-05-10 16:19:00,658 WARNING [train.py:1182] (0/2) Exclude cut with ID 8291-282929-0024-13011-0 from training. Duration: 23.795
2023-05-10 16:19:09,247 INFO [train.py:1021] (0/2) Epoch 31, batch 2850, loss[loss=0.173, simple_loss=0.2657, pruned_loss=0.04014, over 37080.00 frames. ], tot_loss[loss=0.1719, simple_loss=0.2589, pruned_loss=0.04245, over 7095677.12 frames. ], batch size: 110, lr: 3.52e-03, grad_scale: 16.0
2023-05-10 16:19:19,121 WARNING [train.py:1182] (0/2) Exclude cut with ID 8631-249866-0030-130156-0_sp1.1 from training. Duration: 21.5409375
2023-05-10 16:19:22,144 WARNING [train.py:1182] (0/2) Exclude cut with ID 6978-92210-0019-146985-0_sp0.9 from training. Duration: 24.97775
2023-05-10 16:19:34,749 WARNING [train.py:1182] (0/2) Exclude cut with ID 1085-156170-0017-128270-0_sp0.9 from training. Duration: 23.3444375
2023-05-10 16:19:35,015 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=559710.0, ans=0.1
2023-05-10 16:19:54,585 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=559810.0, ans=0.0
2023-05-10 16:20:00,636 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.2576, 4.0769, 3.7235, 4.0538, 3.4000, 3.0736, 3.5379, 2.9976],
device='cuda:0')
2023-05-10 16:20:03,413 WARNING [train.py:1182] (0/2) Exclude cut with ID 6010-56788-0055-90261-0_sp0.9 from training. Duration: 23.2
2023-05-10 16:20:03,634 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=559810.0, ans=0.0
2023-05-10 16:20:09,995 WARNING [train.py:1182] (0/2) Exclude cut with ID 5653-46179-0060-117930-0_sp0.9 from training. Duration: 21.17225
2023-05-10 16:20:25,875 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.5950, 3.6798, 4.0928, 3.6263], device='cuda:0')
2023-05-10 16:20:26,940 INFO [train.py:1021] (0/2) Epoch 31, batch 2900, loss[loss=0.1698, simple_loss=0.263, pruned_loss=0.0383, over 36902.00 frames. ], tot_loss[loss=0.1707, simple_loss=0.2577, pruned_loss=0.04188, over 7136338.42 frames. ], batch size: 105, lr: 3.52e-03, grad_scale: 16.0
2023-05-10 16:20:28,314 INFO [optim.py:478] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.953e+02 3.836e+02 4.434e+02 5.575e+02 8.595e+02, threshold=8.868e+02, percent-clipped=0.0
2023-05-10 16:20:33,132 WARNING [train.py:1182] (0/2) Exclude cut with ID 7255-291500-0008-134307-0_sp0.9 from training. Duration: 24.6555625
2023-05-10 16:20:37,833 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=559910.0, ans=0.025
2023-05-10 16:20:52,689 INFO [checkpoint.py:75] (0/2) Saving checkpoint to pruned_transducer_stateless7/exp1119-smaller-md1500/checkpoint-112000.pt
2023-05-10 16:21:03,374 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=560010.0, ans=0.5
2023-05-10 16:21:15,868 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=560060.0, ans=0.125
2023-05-10 16:21:21,026 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.1765, 3.4967, 3.2442, 3.4387, 3.0030, 2.7712, 3.1962, 2.7139],
device='cuda:0')
2023-05-10 16:21:31,201 WARNING [train.py:1182] (0/2) Exclude cut with ID 3488-65654-0031-41259-0 from training. Duration: 20.44
2023-05-10 16:21:44,565 INFO [train.py:1021] (0/2) Epoch 31, batch 2950, loss[loss=0.1616, simple_loss=0.2583, pruned_loss=0.03245, over 36920.00 frames. ], tot_loss[loss=0.171, simple_loss=0.2581, pruned_loss=0.04195, over 7115895.41 frames. ], batch size: 105, lr: 3.52e-03, grad_scale: 16.0
2023-05-10 16:21:46,113 WARNING [train.py:1182] (0/2) Exclude cut with ID 6951-79737-0018-132285-0_sp0.9 from training. Duration: 23.45
2023-05-10 16:22:02,419 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.68 vs. limit=15.0
2023-05-10 16:22:14,938 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=560260.0, ans=0.2
2023-05-10 16:22:19,194 WARNING [train.py:1182] (0/2) Exclude cut with ID 6945-60535-0076-12784-0_sp0.9 from training. Duration: 20.52225
2023-05-10 16:22:28,073 WARNING [train.py:1182] (0/2) Exclude cut with ID 7255-291500-0008-134307-0 from training. Duration: 22.19
2023-05-10 16:22:38,655 WARNING [train.py:1182] (0/2) Exclude cut with ID 7357-94126-0014-15845-0_sp1.1 from training. Duration: 25.3818125
2023-05-10 16:22:46,758 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=560360.0, ans=0.04949747468305833
2023-05-10 16:22:58,769 WARNING [train.py:1182] (0/2) Exclude cut with ID 6951-79737-0043-132310-0_sp0.9 from training. Duration: 28.0944375
2023-05-10 16:23:01,921 INFO [train.py:1021] (0/2) Epoch 31, batch 3000, loss[loss=0.1513, simple_loss=0.2386, pruned_loss=0.032, over 35362.00 frames. ], tot_loss[loss=0.1711, simple_loss=0.2581, pruned_loss=0.04208, over 7117271.39 frames. ], batch size: 78, lr: 3.52e-03, grad_scale: 16.0
2023-05-10 16:23:01,922 INFO [train.py:1048] (0/2) Computing validation loss
2023-05-10 16:23:12,854 INFO [train.py:1057] (0/2) Epoch 31, validation: loss=0.1522, simple_loss=0.2533, pruned_loss=0.02555, over 944034.00 frames.
2023-05-10 16:23:12,855 INFO [train.py:1058] (0/2) Maximum memory allocated so far is 18682MB
2023-05-10 16:23:14,336 INFO [optim.py:478] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.810e+02 3.515e+02 3.984e+02 4.552e+02 6.373e+02, threshold=7.968e+02, percent-clipped=0.0
2023-05-10 16:23:15,873 WARNING [train.py:1182] (0/2) Exclude cut with ID 2195-150901-0045-59933-0_sp0.9 from training. Duration: 22.9444375
2023-05-10 16:23:23,413 WARNING [train.py:1182] (0/2) Exclude cut with ID 8291-282929-0024-13011-0_sp1.1 from training. Duration: 21.6318125
2023-05-10 16:23:39,331 WARNING [train.py:1182] (0/2) Exclude cut with ID 8631-249866-0030-130156-0 from training. Duration: 23.695
2023-05-10 16:23:58,092 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=560560.0, ans=0.2
2023-05-10 16:24:08,157 WARNING [train.py:1182] (0/2) Exclude cut with ID 7699-105389-0094-26379-0 from training. Duration: 23.955
2023-05-10 16:24:20,438 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=560610.0, ans=0.125
2023-05-10 16:24:29,293 INFO [train.py:1021] (0/2) Epoch 31, batch 3050, loss[loss=0.1892, simple_loss=0.2781, pruned_loss=0.05008, over 35818.00 frames. ], tot_loss[loss=0.1721, simple_loss=0.2591, pruned_loss=0.04253, over 7093786.75 frames. ], batch size: 133, lr: 3.52e-03, grad_scale: 16.0
2023-05-10 16:24:33,298 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=560660.0, ans=0.125
2023-05-10 16:24:39,012 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=560660.0, ans=0.125
2023-05-10 16:24:43,670 WARNING [train.py:1182] (0/2) Exclude cut with ID 8291-282929-0024-13011-0_sp0.9 from training. Duration: 26.438875
2023-05-10 16:25:00,567 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=560760.0, ans=0.0
2023-05-10 16:25:07,962 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([6.0331, 5.2123, 5.3296, 5.8908], device='cuda:0')
2023-05-10 16:25:20,050 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([6.3132, 5.6960, 5.5395, 6.0955], device='cuda:0')
2023-05-10 16:25:29,556 WARNING [train.py:1182] (0/2) Exclude cut with ID 7699-105389-0021-26306-0_sp0.9 from training. Duration: 21.2444375
2023-05-10 16:25:29,582 WARNING [train.py:1182] (0/2) Exclude cut with ID 7357-94126-0014-15845-0_sp0.9 from training. Duration: 31.02225
2023-05-10 16:25:31,675 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.12 vs. limit=15.0
2023-05-10 16:25:36,517 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=560860.0, ans=0.2
2023-05-10 16:25:39,621 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.4042, 4.1631, 3.8210, 4.1534, 3.4966, 3.1596, 3.5983, 3.1420],
device='cuda:0')
2023-05-10 16:25:42,281 WARNING [train.py:1182] (0/2) Exclude cut with ID 432-122774-0017-62487-0 from training. Duration: 22.395
2023-05-10 16:25:46,703 INFO [train.py:1021] (0/2) Epoch 31, batch 3100, loss[loss=0.1738, simple_loss=0.265, pruned_loss=0.04127, over 37050.00 frames. ], tot_loss[loss=0.1714, simple_loss=0.2582, pruned_loss=0.04228, over 7113303.35 frames. ], batch size: 110, lr: 3.52e-03, grad_scale: 16.0
2023-05-10 16:25:48,272 INFO [optim.py:478] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.890e+02 3.495e+02 3.799e+02 4.441e+02 8.344e+02, threshold=7.597e+02, percent-clipped=1.0
2023-05-10 16:25:55,954 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=560910.0, ans=0.125
2023-05-10 16:25:57,281 WARNING [train.py:1182] (0/2) Exclude cut with ID 7357-94126-0045-15876-0 from training. Duration: 21.075
2023-05-10 16:25:57,581 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=560910.0, ans=0.125
2023-05-10 16:26:03,278 WARNING [train.py:1182] (0/2) Exclude cut with ID 6482-98857-0025-147532-0_sp0.9 from training. Duration: 20.0055625
2023-05-10 16:26:03,290 WARNING [train.py:1182] (0/2) Exclude cut with ID 6951-79737-0037-132304-0_sp0.9 from training. Duration: 22.05
2023-05-10 16:26:03,311 WARNING [train.py:1182] (0/2) Exclude cut with ID 7255-291500-0003-134302-0 from training. Duration: 26.8349375
2023-05-10 16:26:03,525 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=560960.0, ans=0.125
2023-05-10 16:26:04,938 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=560960.0, ans=0.2
2023-05-10 16:26:06,371 WARNING [train.py:1182] (0/2) Exclude cut with ID 7357-94126-0009-15840-0_sp1.1 from training. Duration: 22.1090625
2023-05-10 16:26:15,502 WARNING [train.py:1182] (0/2) Exclude cut with ID 7699-105389-0094-26379-0_sp0.9 from training. Duration: 26.6166875
2023-05-10 16:26:30,594 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=561010.0, ans=0.125
2023-05-10 16:26:36,176 WARNING [train.py:1182] (0/2) Exclude cut with ID 2046-178027-0000-53705-0_sp0.9 from training. Duration: 20.3055625
2023-05-10 16:26:37,894 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=561060.0, ans=0.0
2023-05-10 16:26:48,586 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=561110.0, ans=0.025
2023-05-10 16:27:00,409 WARNING [train.py:1182] (0/2) Exclude cut with ID 7205-50138-0008-5373-0_sp0.9 from training. Duration: 20.7
2023-05-10 16:27:00,721 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.6116, 5.4185, 4.8389, 5.2763], device='cuda:0')
2023-05-10 16:27:03,347 INFO [train.py:1021] (0/2) Epoch 31, batch 3150, loss[loss=0.1659, simple_loss=0.2528, pruned_loss=0.03946, over 36870.00 frames. ], tot_loss[loss=0.1718, simple_loss=0.2589, pruned_loss=0.04233, over 7139955.19 frames. ], batch size: 96, lr: 3.52e-03, grad_scale: 16.0
2023-05-10 16:27:44,598 WARNING [train.py:1182] (0/2) Exclude cut with ID 6978-92210-0019-146985-0 from training. Duration: 22.48
2023-05-10 16:28:00,982 WARNING [train.py:1182] (0/2) Exclude cut with ID 7255-291500-0003-134302-0_sp0.9 from training. Duration: 29.816625
2023-05-10 16:28:09,012 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=561360.0, ans=0.0
2023-05-10 16:28:19,928 INFO [train.py:1021] (0/2) Epoch 31, batch 3200, loss[loss=0.1732, simple_loss=0.2585, pruned_loss=0.04395, over 37149.00 frames. ], tot_loss[loss=0.1717, simple_loss=0.2588, pruned_loss=0.04227, over 7133139.46 frames. ], batch size: 98, lr: 3.52e-03, grad_scale: 32.0
2023-05-10 16:28:21,419 INFO [optim.py:478] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.789e+02 3.475e+02 3.864e+02 4.298e+02 6.134e+02, threshold=7.729e+02, percent-clipped=0.0
2023-05-10 16:28:23,066 WARNING [train.py:1182] (0/2) Exclude cut with ID 7255-291500-0005-134304-0_sp1.1 from training. Duration: 22.7590625
2023-05-10 16:28:23,360 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=561410.0, ans=0.0
2023-05-10 16:28:29,096 WARNING [train.py:1182] (0/2) Exclude cut with ID 7357-94126-0026-15857-0 from training. Duration: 22.555
2023-05-10 16:28:48,564 WARNING [train.py:1182] (0/2) Exclude cut with ID 1250-135782-0005-25975-0_sp0.9 from training. Duration: 21.688875
2023-05-10 16:28:54,687 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=561510.0, ans=0.125
2023-05-10 16:29:08,279 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.72 vs. limit=15.0
2023-05-10 16:29:09,314 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=561560.0, ans=0.0
2023-05-10 16:29:24,492 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=561610.0, ans=0.1
2023-05-10 16:29:25,761 WARNING [train.py:1182] (0/2) Exclude cut with ID 3488-85273-0038-41224-0_sp0.9 from training. Duration: 22.6
2023-05-10 16:29:27,550 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=561610.0, ans=0.1
2023-05-10 16:29:33,687 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([1.7453, 2.8848, 4.4932, 2.9912], device='cuda:0')
2023-05-10 16:29:36,292 INFO [train.py:1021] (0/2) Epoch 31, batch 3250, loss[loss=0.1556, simple_loss=0.2381, pruned_loss=0.03654, over 37190.00 frames. ], tot_loss[loss=0.1711, simple_loss=0.2582, pruned_loss=0.04206, over 7120039.88 frames. ], batch size: 93, lr: 3.51e-03, grad_scale: 32.0
2023-05-10 16:29:50,189 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=561710.0, ans=0.125
2023-05-10 16:30:04,586 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([6.3784, 5.7449, 5.5734, 6.1416], device='cuda:0')
2023-05-10 16:30:05,972 WARNING [train.py:1182] (0/2) Exclude cut with ID 7357-94126-0009-15840-0 from training. Duration: 24.32
2023-05-10 16:30:12,165 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=561760.0, ans=0.0
2023-05-10 16:30:27,077 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.3551, 3.0593, 2.9121, 3.4705, 1.9258, 3.0881, 3.5388, 3.1807],
device='cuda:0')
2023-05-10 16:30:52,232 INFO [train.py:1021] (0/2) Epoch 31, batch 3300, loss[loss=0.1488, simple_loss=0.2309, pruned_loss=0.03332, over 35419.00 frames. ], tot_loss[loss=0.1706, simple_loss=0.2577, pruned_loss=0.04177, over 7132402.06 frames. ], batch size: 78, lr: 3.51e-03, grad_scale: 32.0
2023-05-10 16:30:53,659 INFO [optim.py:478] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.787e+02 3.413e+02 3.831e+02 4.452e+02 6.471e+02, threshold=7.662e+02, percent-clipped=0.0
2023-05-10 16:31:04,331 WARNING [train.py:1182] (0/2) Exclude cut with ID 8291-276745-0093-13116-0_sp0.9 from training. Duration: 21.061125
2023-05-10 16:31:09,017 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=561960.0, ans=0.125
2023-05-10 16:31:19,484 WARNING [train.py:1182] (0/2) Exclude cut with ID 7357-94126-0024-15855-0_sp0.9 from training. Duration: 20.32225
2023-05-10 16:31:22,597 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=562010.0, ans=0.1
2023-05-10 16:31:29,959 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=562010.0, ans=0.125
2023-05-10 16:31:32,722 WARNING [train.py:1182] (0/2) Exclude cut with ID 3033-130750-0096-55598-0_sp1.1 from training. Duration: 0.7545625
2023-05-10 16:31:46,477 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=562060.0, ans=0.125
2023-05-10 16:31:49,187 WARNING [train.py:1182] (0/2) Exclude cut with ID 4295-39940-0007-92567-0_sp0.9 from training. Duration: 23.9333125
2023-05-10 16:31:50,933 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.5461, 4.8890, 5.0122, 4.6884], device='cuda:0')
2023-05-10 16:32:08,416 INFO [train.py:1021] (0/2) Epoch 31, batch 3350, loss[loss=0.1541, simple_loss=0.2412, pruned_loss=0.03343, over 37039.00 frames. ], tot_loss[loss=0.1709, simple_loss=0.2578, pruned_loss=0.04196, over 7126075.83 frames. ], batch size: 99, lr: 3.51e-03, grad_scale: 32.0
2023-05-10 16:32:24,966 WARNING [train.py:1182] (0/2) Exclude cut with ID 7255-291500-0008-134307-0_sp1.1 from training. Duration: 20.17275
2023-05-10 16:32:26,686 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=562210.0, ans=0.025
2023-05-10 16:32:28,968 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.45 vs. limit=15.0
2023-05-10 16:32:29,517 WARNING [train.py:1182] (0/2) Exclude cut with ID 6978-92210-0019-146985-0_sp1.1 from training. Duration: 20.436375
2023-05-10 16:32:31,272 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=562210.0, ans=0.125
2023-05-10 16:33:10,560 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=562360.0, ans=0.125
2023-05-10 16:33:22,882 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=562410.0, ans=0.0
2023-05-10 16:33:24,003 INFO [train.py:1021] (0/2) Epoch 31, batch 3400, loss[loss=0.1681, simple_loss=0.2635, pruned_loss=0.03637, over 37200.00 frames. ], tot_loss[loss=0.1704, simple_loss=0.2574, pruned_loss=0.04169, over 7133567.81 frames. ], batch size: 102, lr: 3.51e-03, grad_scale: 32.0
2023-05-10 16:33:26,168 INFO [optim.py:478] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.848e+02 3.549e+02 3.917e+02 4.340e+02 6.268e+02, threshold=7.834e+02, percent-clipped=0.0
2023-05-10 16:33:47,956 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=562460.0, ans=0.5
2023-05-10 16:33:53,836 WARNING [train.py:1182] (0/2) Exclude cut with ID 4234-40345-0022-142709-0_sp0.9 from training. Duration: 23.1055625
2023-05-10 16:33:55,423 WARNING [train.py:1182] (0/2) Exclude cut with ID 8291-282929-0007-12994-0_sp1.1 from training. Duration: 23.5
2023-05-10 16:34:07,274 WARNING [train.py:1182] (0/2) Exclude cut with ID 7255-291500-0009-134308-0_sp0.9 from training. Duration: 26.62775
2023-05-10 16:34:11,733 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=562560.0, ans=0.125
2023-05-10 16:34:13,288 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=562560.0, ans=0.125
2023-05-10 16:34:21,388 WARNING [train.py:1182] (0/2) Exclude cut with ID 6951-79737-0018-132285-0 from training. Duration: 21.105
2023-05-10 16:34:27,903 WARNING [train.py:1182] (0/2) Exclude cut with ID 4511-76322-0006-80011-0_sp0.9 from training. Duration: 24.411125
2023-05-10 16:34:41,243 INFO [train.py:1021] (0/2) Epoch 31, batch 3450, loss[loss=0.1746, simple_loss=0.268, pruned_loss=0.0406, over 36900.00 frames. ], tot_loss[loss=0.1705, simple_loss=0.2575, pruned_loss=0.04174, over 7143993.97 frames. ], batch size: 105, lr: 3.51e-03, grad_scale: 32.0
2023-05-10 16:34:53,466 WARNING [train.py:1182] (0/2) Exclude cut with ID 6758-72288-0033-108368-0_sp1.1 from training. Duration: 21.263625
2023-05-10 16:35:28,901 WARNING [train.py:1182] (0/2) Exclude cut with ID 4234-40345-0022-142709-0 from training. Duration: 20.795
2023-05-10 16:35:39,334 WARNING [train.py:1182] (0/2) Exclude cut with ID 7357-94126-0021-15852-0 from training. Duration: 24.76
2023-05-10 16:35:40,810 WARNING [train.py:1182] (0/2) Exclude cut with ID 3867-173237-0077-144769-0_sp0.9 from training. Duration: 22.25
2023-05-10 16:35:54,685 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=562860.0, ans=0.125
2023-05-10 16:35:57,250 INFO [train.py:1021] (0/2) Epoch 31, batch 3500, loss[loss=0.1572, simple_loss=0.2375, pruned_loss=0.03846, over 36747.00 frames. ], tot_loss[loss=0.1707, simple_loss=0.2576, pruned_loss=0.04187, over 7124348.80 frames. ], batch size: 89, lr: 3.51e-03, grad_scale: 32.0
2023-05-10 16:35:58,770 INFO [optim.py:478] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.956e+02 3.547e+02 4.086e+02 4.716e+02 7.284e+02, threshold=8.173e+02, percent-clipped=0.0
2023-05-10 16:36:07,275 WARNING [train.py:1182] (0/2) Exclude cut with ID 7357-94126-0026-15857-0_sp1.1 from training. Duration: 20.5045625
2023-05-10 16:36:38,058 INFO [zipformer.py:1666] (0/2) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([6.1723, 5.5334, 5.3368, 5.9448], device='cuda:0')
2023-05-10 16:36:39,925 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.54 vs. limit=15.0
2023-05-10 16:37:12,375 INFO [train.py:1021] (0/2) Epoch 31, batch 3550, loss[loss=0.1479, simple_loss=0.2374, pruned_loss=0.02923, over 36941.00 frames. ], tot_loss[loss=0.1706, simple_loss=0.2577, pruned_loss=0.04177, over 7143899.16 frames. ], batch size: 95, lr: 3.51e-03, grad_scale: 32.0
2023-05-10 16:37:12,674 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=563160.0, ans=0.07
2023-05-10 16:37:42,495 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=563260.0, ans=0.025
2023-05-10 16:37:58,498 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=563310.0, ans=0.125
2023-05-10 16:38:13,492 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.41 vs. limit=15.0
2023-05-10 16:38:25,642 INFO [train.py:1021] (0/2) Epoch 31, batch 3600, loss[loss=0.1832, simple_loss=0.2727, pruned_loss=0.04687, over 36323.00 frames. ], tot_loss[loss=0.1706, simple_loss=0.2579, pruned_loss=0.04171, over 7137767.71 frames. ], batch size: 126, lr: 3.51e-03, grad_scale: 32.0
2023-05-10 16:38:26,995 INFO [optim.py:478] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.781e+02 3.444e+02 3.876e+02 4.601e+02 6.604e+02, threshold=7.752e+02, percent-clipped=0.0
2023-05-10 16:38:32,851 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=563410.0, ans=0.02
2023-05-10 16:38:51,147 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=563460.0, ans=0.0
2023-05-10 16:38:58,568 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0
2023-05-10 16:39:06,648 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.43 vs. limit=15.0
2023-05-10 16:39:16,487 INFO [checkpoint.py:75] (0/2) Saving checkpoint to pruned_transducer_stateless7/exp1119-smaller-md1500/epoch-31.pt
2023-05-10 16:39:33,659 WARNING [train.py:1182] (0/2) Exclude cut with ID 7859-102521-0017-7548-0_sp1.1 from training. Duration: 22.2954375
2023-05-10 16:39:37,853 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=563590.0, ans=0.125
2023-05-10 16:39:37,911 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=563590.0, ans=0.125
2023-05-10 16:39:38,968 INFO [train.py:1021] (0/2) Epoch 32, batch 0, loss[loss=0.1978, simple_loss=0.2849, pruned_loss=0.05542, over 23858.00 frames. ], tot_loss[loss=0.1978, simple_loss=0.2849, pruned_loss=0.05542, over 23858.00 frames. ], batch size: 234, lr: 3.45e-03, grad_scale: 32.0
2023-05-10 16:39:38,969 INFO [train.py:1048] (0/2) Computing validation loss
2023-05-10 16:39:49,756 INFO [train.py:1057] (0/2) Epoch 32, validation: loss=0.1529, simple_loss=0.2541, pruned_loss=0.02586, over 944034.00 frames.
2023-05-10 16:39:49,757 INFO [train.py:1058] (0/2) Maximum memory allocated so far is 18682MB
2023-05-10 16:40:11,149 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=563640.0, ans=0.125
2023-05-10 16:40:13,496 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.31 vs. limit=15.0
2023-05-10 16:40:35,058 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=563740.0, ans=0.0
2023-05-10 16:40:40,947 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=563740.0, ans=0.125
2023-05-10 16:40:43,770 WARNING [train.py:1182] (0/2) Exclude cut with ID 298-126791-0067-24026-0_sp0.9 from training. Duration: 21.438875
2023-05-10 16:40:49,803 WARNING [train.py:1182] (0/2) Exclude cut with ID 5652-39938-0025-23684-0_sp0.9 from training. Duration: 22.2055625
2023-05-10 16:40:53,122 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=563790.0, ans=0.2
2023-05-10 16:40:58,077 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.46 vs. limit=10.0
2023-05-10 16:41:06,138 INFO [train.py:1021] (0/2) Epoch 32, batch 50, loss[loss=0.1717, simple_loss=0.2715, pruned_loss=0.03592, over 37116.00 frames. ], tot_loss[loss=0.1654, simple_loss=0.2585, pruned_loss=0.03612, over 1637869.14 frames. ], batch size: 107, lr: 3.45e-03, grad_scale: 32.0
2023-05-10 16:41:24,602 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=563890.0, ans=0.025
2023-05-10 16:41:28,763 INFO [optim.py:478] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.520e+02 3.172e+02 3.790e+02 4.255e+02 6.631e+02, threshold=7.580e+02, percent-clipped=0.0
2023-05-10 16:41:48,748 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=8.00 vs. limit=22.5
2023-05-10 16:41:57,553 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=563990.0, ans=0.125
2023-05-10 16:42:11,291 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=564040.0, ans=0.125
2023-05-10 16:42:22,840 INFO [train.py:1021] (0/2) Epoch 32, batch 100, loss[loss=0.1757, simple_loss=0.2732, pruned_loss=0.03913, over 36303.00 frames. ], tot_loss[loss=0.1654, simple_loss=0.2577, pruned_loss=0.03651, over 2870490.00 frames. ], batch size: 126, lr: 3.45e-03, grad_scale: 32.0
2023-05-10 16:42:23,018 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=564090.0, ans=0.1
2023-05-10 16:42:51,960 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.78 vs. limit=6.0
2023-05-10 16:43:03,634 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.38 vs. limit=12.0
2023-05-10 16:43:07,864 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=564240.0, ans=0.125
2023-05-10 16:43:39,892 INFO [train.py:1021] (0/2) Epoch 32, batch 150, loss[loss=0.1652, simple_loss=0.2548, pruned_loss=0.03777, over 37142.00 frames. ], tot_loss[loss=0.1646, simple_loss=0.2572, pruned_loss=0.03598, over 3846560.09 frames. ], batch size: 98, lr: 3.45e-03, grad_scale: 32.0
2023-05-10 16:43:47,968 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.whiten.whitening_limit, batch_count=564340.0, ans=12.0
2023-05-10 16:44:01,998 INFO [optim.py:478] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.446e+02 2.946e+02 3.285e+02 4.107e+02 5.787e+02, threshold=6.569e+02, percent-clipped=0.0
2023-05-10 16:44:03,725 WARNING [train.py:1182] (0/2) Exclude cut with ID 7859-102521-0017-7548-0 from training. Duration: 24.525
2023-05-10 16:44:14,556 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=564440.0, ans=0.125
2023-05-10 16:44:39,921 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=564540.0, ans=0.2
2023-05-10 16:44:41,117 WARNING [train.py:1182] (0/2) Exclude cut with ID 3699-47246-0007-3408-0_sp0.9 from training. Duration: 20.26675
2023-05-10 16:44:51,744 INFO [scaling.py:178] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=564540.0, ans=0.0
2023-05-10 16:44:56,021 INFO [train.py:1021] (0/2) Epoch 32, batch 200, loss[loss=0.153, simple_loss=0.243, pruned_loss=0.03145, over 36851.00 frames. ], tot_loss[loss=0.163, simple_loss=0.2553, pruned_loss=0.03532, over 4611293.82 frames. ], batch size: 96, lr: 3.45e-03, grad_scale: 32.0
2023-05-10 16:44:56,149 WARNING [train.py:1182] (0/2) Exclude cut with ID 7859-102521-0017-7548-0_sp0.9 from training. Duration: 27.25
2023-05-10 16:45:25,659 INFO [scaling.py:969] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.20 vs. limit=15.0
2023-05-10 16:45:45,545 INFO [checkpoint.py:75] (0/2) Saving checkpoint to pruned_transducer_stateless7/exp1119-smaller-md1500/bad-model-0.pt
2023-05-10 16:45:46,958 INFO [train.py:1307] (0/2) Saving batch to pruned_transducer_stateless7/exp1119-smaller-md1500/batch-7aeff54e-808c-46a6-1f49-2fba47a1fca7.pt
2023-05-10 16:45:47,191 INFO [train.py:1313] (0/2) features shape: torch.Size([88, 1703, 80])
2023-05-10 16:45:47,200 INFO [train.py:1317] (0/2) num tokens: 7191