File size: 168,873 Bytes
7baae4b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
2024-03-09 12:56:09,176 INFO [train.py:1065] (0/4) Training started
2024-03-09 12:56:09,193 INFO [train.py:1075] (0/4) Device: cuda:0
2024-03-09 12:56:09,282 INFO [lexicon.py:168] (0/4) Loading pre-compiled data/lang_char/Linv.pt
2024-03-09 12:56:09,334 INFO [train.py:1086] (0/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '2989b0b1186fa6022932804f5b39fbb2781ebf42', 'k2-git-date': 'Fri Nov 24 11:34:10 2023', 'lhotse-version': '1.22.0.dev+git.d8ed1bbb.dirty', 'torch-version': '1.11.0+cu102', 'torch-cuda-available': True, 'torch-cuda-version': '10.2', 'python-version': '3.9', 'icefall-git-branch': 'dev/mdcc', 'icefall-git-sha1': 'f62fc7f0-clean', 'icefall-git-date': 'Sat Mar 9 12:55:42 2024', 'icefall-path': '/star-home/jinzengrui/lib/miniconda3/envs/dev39/lib/python3.9/site-packages/icefall-1.0-py3.9.egg', 'k2-path': '/star-home/jinzengrui/lib/miniconda3/envs/dev39/lib/python3.9/site-packages/k2-1.24.4.dev20231207+cuda10.2.torch1.11.0-py3.9-linux-x86_64.egg/k2/__init__.py', 'lhotse-path': '/star-home/jinzengrui/lib/miniconda3/envs/dev39/lib/python3.9/site-packages/lhotse-1.22.0.dev0+git.d8ed1bbb.dirty-py3.9.egg/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-2-1207150844-f49d8c4f4-c49d5', 'IP address': '10.177.22.19'}, 'world_size': 4, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 30, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('zipformer/exp'), 'lang_dir': PosixPath('data/lang_char'), 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'context_size': 1, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 1000, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'blank_id': 0, 'vocab_size': 4852}
2024-03-09 12:56:09,334 INFO [train.py:1088] (0/4) About to create model
2024-03-09 12:56:09,995 INFO [train.py:1092] (0/4) Number of model parameters: 74470867
2024-03-09 12:56:14,924 INFO [train.py:1107] (0/4) Using DDP
2024-03-09 12:56:15,509 INFO [asr_datamodule.py:368] (0/4) About to get train cuts
2024-03-09 12:56:15,622 INFO [asr_datamodule.py:376] (0/4) About to get valid cuts
2024-03-09 12:56:15,640 INFO [asr_datamodule.py:195] (0/4) About to get Musan cuts
2024-03-09 12:56:18,183 INFO [asr_datamodule.py:200] (0/4) Enable MUSAN
2024-03-09 12:56:18,183 INFO [asr_datamodule.py:223] (0/4) Enable SpecAugment
2024-03-09 12:56:18,183 INFO [asr_datamodule.py:224] (0/4) Time warp factor: 80
2024-03-09 12:56:18,184 INFO [asr_datamodule.py:234] (0/4) Num frame mask: 10
2024-03-09 12:56:18,184 INFO [asr_datamodule.py:247] (0/4) About to create train dataset
2024-03-09 12:56:18,184 INFO [asr_datamodule.py:273] (0/4) Using DynamicBucketingSampler.
2024-03-09 12:56:19,023 INFO [asr_datamodule.py:290] (0/4) About to create train dataloader
2024-03-09 12:56:19,023 INFO [asr_datamodule.py:315] (0/4) About to create dev dataset
2024-03-09 12:56:19,346 INFO [asr_datamodule.py:332] (0/4) About to create dev dataloader
2024-03-09 12:57:18,484 INFO [train.py:997] (0/4) Epoch 1, batch 0, loss[loss=10.43, simple_loss=9.503, pruned_loss=9.26, over 23353.00 frames. ], tot_loss[loss=10.43, simple_loss=9.503, pruned_loss=9.26, over 23353.00 frames. ], batch size: 102, lr: 2.25e-02, grad_scale: 1.0
2024-03-09 12:57:18,486 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 12:57:28,778 INFO [train.py:1029] (0/4) Epoch 1, validation: loss=10.41, simple_loss=9.49, pruned_loss=9.134, over 452978.00 frames. 
2024-03-09 12:57:28,779 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 25901MB
2024-03-09 12:57:35,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=0.0, ans=5.0
2024-03-09 12:57:38,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=0.0, ans=0.3
2024-03-09 12:57:42,630 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.90 vs. limit=5.0
2024-03-09 12:57:42,680 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=29.82 vs. limit=7.5
2024-03-09 12:57:45,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=66.66666666666667, ans=0.1975
2024-03-09 12:57:49,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=66.66666666666667, ans=0.0985
2024-03-09 12:57:52,244 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 5.247e+03 5.651e+03 5.908e+03 6.903e+03 6.981e+03, threshold=2.363e+04, percent-clipped=0.0
2024-03-09 12:57:58,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=66.66666666666667, ans=0.496875
2024-03-09 12:57:58,667 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=107.87 vs. limit=7.525
2024-03-09 12:58:10,355 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.724e+03 3.453e+03 5.651e+03 6.615e+03 7.215e+03, threshold=2.260e+04, percent-clipped=0.0
2024-03-09 12:58:13,213 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=511.69 vs. limit=7.6
2024-03-09 12:58:15,224 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=232.33 vs. limit=7.6
2024-03-09 12:58:18,454 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=87.97 vs. limit=4.053333333333334
2024-03-09 12:58:24,071 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=223.96 vs. limit=7.575
2024-03-09 12:58:31,644 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=149.27 vs. limit=7.575
2024-03-09 12:58:34,666 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=254.66 vs. limit=7.65
2024-03-09 12:58:35,005 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=277.45 vs. limit=7.575
2024-03-09 12:58:45,198 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=72.53 vs. limit=4.1066666666666665
2024-03-09 12:58:46,102 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.817e+02 1.921e+03 2.306e+03 5.651e+03 7.215e+03, threshold=9.223e+03, percent-clipped=0.0
2024-03-09 12:58:52,634 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=168.22 vs. limit=5.133333333333334
2024-03-09 12:58:59,108 INFO [train.py:997] (0/4) Epoch 1, batch 50, loss[loss=1.111, simple_loss=0.9911, pruned_loss=1.077, over 20140.00 frames. ], tot_loss[loss=3.869, simple_loss=3.562, pruned_loss=3.019, over 1065856.81 frames. ], batch size: 61, lr: 2.48e-02, grad_scale: 0.25
2024-03-09 12:59:00,044 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=367.12 vs. limit=7.625
2024-03-09 12:59:01,816 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=342.93 vs. limit=7.75
2024-03-09 12:59:05,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=333.3333333333333, ans=0.8883333333333333
2024-03-09 12:59:11,230 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=468.64 vs. limit=7.625
2024-03-09 12:59:18,026 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=231.76 vs. limit=7.8
2024-03-09 12:59:18,181 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.37 vs. limit=3.06
2024-03-09 12:59:20,233 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=3.06
2024-03-09 12:59:20,261 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=338.60 vs. limit=7.65
2024-03-09 12:59:26,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=400.0, ans=0.20600000000000002
2024-03-09 12:59:26,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=400.0, ans=0.296
2024-03-09 12:59:32,398 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=315.58 vs. limit=5.2
2024-03-09 12:59:48,062 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=112.75 vs. limit=7.675
2024-03-09 13:00:01,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=533.3333333333334, ans=0.29466666666666663
2024-03-09 13:00:02,712 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=229.75 vs. limit=7.7
2024-03-09 13:00:04,724 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.56 vs. limit=4.213333333333333
2024-03-09 13:00:08,200 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=87.24 vs. limit=7.9
2024-03-09 13:00:13,532 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=7.42 vs. limit=4.24
2024-03-09 13:00:22,131 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=239.35 vs. limit=7.725
2024-03-09 13:00:23,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=600.0, ans=0.294
2024-03-09 13:00:24,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=600.0, ans=5.3
2024-03-09 13:00:31,788 INFO [train.py:997] (0/4) Epoch 1, batch 100, loss[loss=1.057, simple_loss=0.9247, pruned_loss=1.073, over 24275.00 frames. ], tot_loss[loss=2.348, simple_loss=2.139, pruned_loss=1.952, over 1881556.47 frames. ], batch size: 267, lr: 2.70e-02, grad_scale: 0.5
2024-03-09 13:00:32,876 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.32 vs. limit=4.266666666666667
2024-03-09 13:00:37,048 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 5.046e+01 9.193e+01 2.011e+02 2.156e+03 7.215e+03, threshold=4.023e+02, percent-clipped=0.0
2024-03-09 13:00:41,780 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=64.02 vs. limit=7.75
2024-03-09 13:00:43,219 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=56.94 vs. limit=5.333333333333333
2024-03-09 13:00:51,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=733.3333333333334, ans=0.8743333333333334
2024-03-09 13:00:56,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=733.3333333333334, ans=0.04770833333333334
2024-03-09 13:01:02,529 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=13.14 vs. limit=5.183333333333334
2024-03-09 13:01:06,340 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=26.22 vs. limit=8.1
2024-03-09 13:01:10,128 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=49.03 vs. limit=7.8
2024-03-09 13:01:14,956 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=50.68 vs. limit=8.1
2024-03-09 13:01:29,550 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=160.24 vs. limit=7.825
2024-03-09 13:01:53,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=933.3333333333334, ans=0.04708333333333334
2024-03-09 13:01:53,657 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=16.97 vs. limit=5.233333333333333
2024-03-09 13:02:03,201 INFO [train.py:997] (0/4) Epoch 1, batch 150, loss[loss=0.9259, simple_loss=0.7907, pruned_loss=0.9827, over 24134.00 frames. ], tot_loss[loss=1.782, simple_loss=1.604, pruned_loss=1.567, over 2516736.17 frames. ], batch size: 176, lr: 2.93e-02, grad_scale: 0.5
2024-03-09 13:02:11,048 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=218.28 vs. limit=7.875
2024-03-09 13:02:12,725 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.56 vs. limit=4.4
2024-03-09 13:02:14,113 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=70.25 vs. limit=8.25
2024-03-09 13:02:14,486 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=56.43 vs. limit=7.875
2024-03-09 13:02:16,403 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-1.pt
2024-03-09 13:03:01,373 INFO [train.py:997] (0/4) Epoch 2, batch 0, loss[loss=1.018, simple_loss=0.8779, pruned_loss=1.021, over 23797.00 frames. ], tot_loss[loss=1.018, simple_loss=0.8779, pruned_loss=1.021, over 23797.00 frames. ], batch size: 447, lr: 2.91e-02, grad_scale: 1.0
2024-03-09 13:03:01,374 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 13:03:11,801 INFO [train.py:1029] (0/4) Epoch 2, validation: loss=0.9516, simple_loss=0.8161, pruned_loss=0.9787, over 452978.00 frames. 
2024-03-09 13:03:11,802 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 13:03:14,878 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=363.92 vs. limit=7.895
2024-03-09 13:03:30,549 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=63.67 vs. limit=7.92
2024-03-09 13:03:35,802 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=44.64 vs. limit=8.34
2024-03-09 13:03:35,858 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=193.72 vs. limit=7.92
2024-03-09 13:03:39,280 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=114.96 vs. limit=7.92
2024-03-09 13:03:45,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1186.6666666666667, ans=0.444375
2024-03-09 13:03:45,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1186.6666666666667, ans=0.8584666666666667
2024-03-09 13:03:46,654 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=362.60 vs. limit=7.945
2024-03-09 13:03:47,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1186.6666666666667, ans=0.04629166666666667
2024-03-09 13:03:52,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1186.6666666666667, ans=0.1555
2024-03-09 13:04:06,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1253.3333333333333, ans=0.09216666666666667
2024-03-09 13:04:07,408 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=28.63 vs. limit=7.97
2024-03-09 13:04:07,600 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=80.10 vs. limit=7.97
2024-03-09 13:04:07,894 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=24.08 vs. limit=7.97
2024-03-09 13:04:16,360 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.27 vs. limit=7.97
2024-03-09 13:04:22,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1320.0, ans=0.8538
2024-03-09 13:04:25,867 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=230.87 vs. limit=7.995
2024-03-09 13:04:29,204 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=64.66 vs. limit=5.66
2024-03-09 13:04:31,668 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 4.992e+01 8.885e+01 1.035e+02 1.288e+02 2.193e+02, threshold=2.069e+02, percent-clipped=0.0
2024-03-09 13:04:36,676 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=97.09 vs. limit=7.995
2024-03-09 13:04:39,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1320.0, ans=0.8538
2024-03-09 13:04:42,154 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=38.92 vs. limit=8.02
2024-03-09 13:04:42,729 INFO [train.py:997] (0/4) Epoch 2, batch 50, loss[loss=0.9398, simple_loss=0.8078, pruned_loss=0.898, over 23692.00 frames. ], tot_loss[loss=0.9102, simple_loss=0.778, pruned_loss=0.9183, over 1074146.93 frames. ], batch size: 486, lr: 3.13e-02, grad_scale: 1.0
2024-03-09 13:04:43,624 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=71.38 vs. limit=8.02
2024-03-09 13:04:56,020 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=16.02 vs. limit=5.693333333333333
2024-03-09 13:04:57,963 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.01 vs. limit=5.346666666666667
2024-03-09 13:04:59,627 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=47.54 vs. limit=8.59
2024-03-09 13:05:04,752 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.79 vs. limit=4.581333333333333
2024-03-09 13:05:10,192 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.84 vs. limit=5.363333333333333
2024-03-09 13:05:17,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1453.3333333333333, ans=0.28546666666666665
2024-03-09 13:05:22,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1520.0, ans=0.14300000000000002
2024-03-09 13:05:31,745 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=16.24 vs. limit=8.07
2024-03-09 13:05:47,625 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=81.22 vs. limit=8.69
2024-03-09 13:05:49,321 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=53.78 vs. limit=8.69
2024-03-09 13:05:59,965 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=44.57 vs. limit=8.74
2024-03-09 13:06:02,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1653.3333333333333, ans=0.138
2024-03-09 13:06:07,183 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=16.18 vs. limit=8.12
2024-03-09 13:06:09,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1653.3333333333333, ans=0.4225
2024-03-09 13:06:11,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1653.3333333333333, ans=0.4225
2024-03-09 13:06:16,168 INFO [train.py:997] (0/4) Epoch 2, batch 100, loss[loss=0.9075, simple_loss=0.7764, pruned_loss=0.8376, over 23795.00 frames. ], tot_loss[loss=0.8809, simple_loss=0.7521, pruned_loss=0.8645, over 1877834.46 frames. ], batch size: 447, lr: 3.35e-02, grad_scale: 2.0
2024-03-09 13:06:21,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1720.0, ans=0.419375
2024-03-09 13:06:34,717 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.52 vs. limit=5.446666666666666
2024-03-09 13:06:36,721 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.50 vs. limit=8.84
2024-03-09 13:06:41,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1786.6666666666667, ans=0.41625
2024-03-09 13:06:52,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1853.3333333333333, ans=0.2683333333333333
2024-03-09 13:07:02,209 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=68.35 vs. limit=8.89
2024-03-09 13:07:22,188 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=23.25 vs. limit=8.22
2024-03-09 13:07:25,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1920.0, ans=0.41000000000000003
2024-03-09 13:07:26,233 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=21.12 vs. limit=5.96
2024-03-09 13:07:26,415 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=50.27 vs. limit=8.22
2024-03-09 13:07:35,929 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=195.77 vs. limit=8.245
2024-03-09 13:07:37,966 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.386e+01 8.999e+01 1.029e+02 1.187e+02 2.200e+02, threshold=2.058e+02, percent-clipped=1.0
2024-03-09 13:07:46,563 INFO [train.py:997] (0/4) Epoch 2, batch 150, loss[loss=0.8338, simple_loss=0.7059, pruned_loss=0.763, over 23219.00 frames. ], tot_loss[loss=0.8662, simple_loss=0.7386, pruned_loss=0.8275, over 2515639.78 frames. ], batch size: 102, lr: 3.57e-02, grad_scale: 2.0
2024-03-09 13:07:47,938 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.85 vs. limit=8.27
2024-03-09 13:07:59,675 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-2.pt
2024-03-09 13:08:43,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2106.6666666666665, ans=0.8262666666666667
2024-03-09 13:08:44,927 INFO [train.py:997] (0/4) Epoch 3, batch 0, loss[loss=0.7839, simple_loss=0.6614, pruned_loss=0.7208, over 23163.00 frames. ], tot_loss[loss=0.7839, simple_loss=0.6614, pruned_loss=0.7208, over 23163.00 frames. ], batch size: 102, lr: 3.42e-02, grad_scale: 4.0
2024-03-09 13:08:44,928 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 13:08:54,190 INFO [train.py:1029] (0/4) Epoch 3, validation: loss=0.8556, simple_loss=0.7313, pruned_loss=0.7513, over 452978.00 frames. 
2024-03-09 13:08:54,190 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 13:08:55,429 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=35.47 vs. limit=8.29
2024-03-09 13:09:00,497 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.23 vs. limit=9.08
2024-03-09 13:09:01,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2106.6666666666665, ans=0.2366666666666667
2024-03-09 13:09:09,363 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.32 vs. limit=4.842666666666666
2024-03-09 13:09:16,520 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=39.43 vs. limit=8.315
2024-03-09 13:09:19,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2173.3333333333335, ans=0.0511
2024-03-09 13:09:22,046 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=53.72 vs. limit=6.086666666666667
2024-03-09 13:09:25,400 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=27.77 vs. limit=9.13
2024-03-09 13:09:35,460 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.22 vs. limit=6.12
2024-03-09 13:09:53,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2306.6666666666665, ans=0.04279166666666667
2024-03-09 13:09:54,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2306.6666666666665, ans=0.391875
2024-03-09 13:10:26,707 INFO [train.py:997] (0/4) Epoch 3, batch 50, loss[loss=0.7973, simple_loss=0.6776, pruned_loss=0.6859, over 19767.00 frames. ], tot_loss[loss=0.8008, simple_loss=0.6802, pruned_loss=0.7039, over 1068611.49 frames. ], batch size: 59, lr: 3.63e-02, grad_scale: 4.0
2024-03-09 13:10:38,491 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=30.10 vs. limit=8.415
2024-03-09 13:10:39,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2440.0, ans=0.042375
2024-03-09 13:10:42,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2506.6666666666665, ans=0.3825
2024-03-09 13:10:42,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2506.6666666666665, ans=0.2376
2024-03-09 13:10:48,921 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.42 vs. limit=5.626666666666667
2024-03-09 13:10:52,645 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=46.87 vs. limit=8.44
2024-03-09 13:11:04,426 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.19 vs. limit=9.43
2024-03-09 13:11:04,586 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.16 vs. limit=9.43
2024-03-09 13:11:04,784 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=14.16 vs. limit=8.465
2024-03-09 13:11:19,166 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2640.0, ans=0.10099999999999999
2024-03-09 13:11:28,814 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.96 vs. limit=6.32
2024-03-09 13:11:34,500 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.376e+01 1.355e+02 1.829e+02 2.456e+02 5.542e+02, threshold=3.657e+02, percent-clipped=39.0
2024-03-09 13:11:39,146 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=30.81 vs. limit=8.515
2024-03-09 13:11:47,777 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.15 vs. limit=5.676666666666667
2024-03-09 13:11:55,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2773.3333333333335, ans=0.037599999999999995
2024-03-09 13:11:56,951 INFO [train.py:997] (0/4) Epoch 3, batch 100, loss[loss=0.6924, simple_loss=0.5943, pruned_loss=0.5563, over 24243.00 frames. ], tot_loss[loss=0.7716, simple_loss=0.6582, pruned_loss=0.6552, over 1880343.40 frames. ], batch size: 188, lr: 3.84e-02, grad_scale: 8.0
2024-03-09 13:12:03,224 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.49 vs. limit=9.58
2024-03-09 13:12:12,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2840.0, ans=0.366875
2024-03-09 13:12:18,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=2840.0, ans=0.2216
2024-03-09 13:12:19,376 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.28 vs. limit=8.565
2024-03-09 13:12:21,272 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.02 vs. limit=9.629999999999999
2024-03-09 13:12:24,381 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.33 vs. limit=9.629999999999999
2024-03-09 13:12:26,266 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.89 vs. limit=8.565
2024-03-09 13:12:46,236 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.80 vs. limit=6.453333333333333
2024-03-09 13:12:53,814 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.42 vs. limit=6.486666666666666
2024-03-09 13:13:19,575 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.86 vs. limit=8.64
2024-03-09 13:13:21,284 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.75 vs. limit=9.78
2024-03-09 13:13:25,257 INFO [train.py:997] (0/4) Epoch 3, batch 150, loss[loss=0.6125, simple_loss=0.5407, pruned_loss=0.4349, over 24159.00 frames. ], tot_loss[loss=0.7192, simple_loss=0.6192, pruned_loss=0.5825, over 2517629.28 frames. ], batch size: 295, lr: 4.05e-02, grad_scale: 8.0
2024-03-09 13:13:30,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=3106.6666666666665, ans=6.941666666666666
2024-03-09 13:13:33,671 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=7.991e+01
2024-03-09 13:13:35,995 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=22.86 vs. limit=8.665
2024-03-09 13:13:38,207 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-3.pt
2024-03-09 13:14:26,850 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.42 vs. limit=6.58
2024-03-09 13:14:27,472 INFO [train.py:997] (0/4) Epoch 4, batch 0, loss[loss=0.5953, simple_loss=0.5279, pruned_loss=0.4147, over 24073.00 frames. ], tot_loss[loss=0.5953, simple_loss=0.5279, pruned_loss=0.4147, over 24073.00 frames. ], batch size: 365, lr: 3.82e-02, grad_scale: 16.0
2024-03-09 13:14:27,473 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 13:14:37,768 INFO [train.py:1029] (0/4) Epoch 4, validation: loss=0.515, simple_loss=0.4763, pruned_loss=0.3039, over 452978.00 frames. 
2024-03-09 13:14:37,769 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 13:14:44,077 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=12.34 vs. limit=9.870000000000001
2024-03-09 13:15:00,759 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.29 vs. limit=8.71
2024-03-09 13:15:10,612 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=10.39 vs. limit=9.92
2024-03-09 13:15:18,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3293.3333333333335, ans=0.34562499999999996
2024-03-09 13:15:20,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3293.3333333333335, ans=0.07649999999999998
2024-03-09 13:15:23,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3293.3333333333335, ans=0.34562499999999996
2024-03-09 13:15:26,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3293.3333333333335, ans=0.34562499999999996
2024-03-09 13:15:31,400 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.600e+02 2.775e+02 3.449e+02 4.262e+02 1.233e+03, threshold=6.899e+02, percent-clipped=36.0
2024-03-09 13:15:36,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3360.0, ans=0.3425
2024-03-09 13:15:38,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3360.0, ans=0.3425
2024-03-09 13:15:42,536 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=10.28 vs. limit=10.02
2024-03-09 13:15:47,758 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.70 vs. limit=8.785
2024-03-09 13:15:55,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3426.6666666666665, ans=0.339375
2024-03-09 13:15:57,310 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.04 vs. limit=10.07
2024-03-09 13:16:02,389 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=10.86 vs. limit=10.07
2024-03-09 13:16:07,037 INFO [train.py:997] (0/4) Epoch 4, batch 50, loss[loss=0.4472, simple_loss=0.4138, pruned_loss=0.2608, over 20172.00 frames. ], tot_loss[loss=0.5215, simple_loss=0.4711, pruned_loss=0.3366, over 1061168.05 frames. ], batch size: 60, lr: 3.92e-02, grad_scale: 8.0
2024-03-09 13:16:15,208 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.97 vs. limit=10.120000000000001
2024-03-09 13:16:25,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3560.0, ans=0.035
2024-03-09 13:16:48,326 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.66 vs. limit=8.86
2024-03-09 13:16:56,498 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.45 vs. limit=8.86
2024-03-09 13:17:03,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3693.3333333333335, ans=0.21306666666666665
2024-03-09 13:17:04,647 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=10.26 vs. limit=10.27
2024-03-09 13:17:24,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3760.0, ans=0.26239999999999997
2024-03-09 13:17:26,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3760.0, ans=0.07
2024-03-09 13:17:26,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3760.0, ans=0.05899999999999997
2024-03-09 13:17:29,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3760.0, ans=0.32375
2024-03-09 13:17:32,094 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.67 vs. limit=8.935
2024-03-09 13:17:32,113 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.74 vs. limit=8.935
2024-03-09 13:17:32,782 INFO [train.py:997] (0/4) Epoch 4, batch 100, loss[loss=0.4661, simple_loss=0.434, pruned_loss=0.2632, over 24016.00 frames. ], tot_loss[loss=0.4865, simple_loss=0.4458, pruned_loss=0.2959, over 1885565.16 frames. ], batch size: 388, lr: 3.92e-02, grad_scale: 8.0
2024-03-09 13:17:56,788 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.83 vs. limit=8.96
2024-03-09 13:18:06,591 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.27 vs. limit=5.99
2024-03-09 13:18:14,833 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.31 vs. limit=8.985
2024-03-09 13:18:26,601 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.478e+02 2.209e+02 2.728e+02 3.814e+02 7.926e+02, threshold=5.455e+02, percent-clipped=1.0
2024-03-09 13:18:30,780 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.13 vs. limit=5.610666666666667
2024-03-09 13:18:57,703 INFO [train.py:997] (0/4) Epoch 4, batch 150, loss[loss=0.4913, simple_loss=0.4534, pruned_loss=0.2852, over 23722.00 frames. ], tot_loss[loss=0.4589, simple_loss=0.4257, pruned_loss=0.2654, over 2519530.07 frames. ], batch size: 486, lr: 3.91e-02, grad_scale: 8.0
2024-03-09 13:19:01,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4160.0, ans=0.305
2024-03-09 13:19:02,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4160.0, ans=0.2584
2024-03-09 13:19:10,215 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-4.pt
2024-03-09 13:19:56,274 INFO [train.py:997] (0/4) Epoch 5, batch 0, loss[loss=0.3976, simple_loss=0.3804, pruned_loss=0.2011, over 24158.00 frames. ], tot_loss[loss=0.3976, simple_loss=0.3804, pruned_loss=0.2011, over 24158.00 frames. ], batch size: 366, lr: 3.65e-02, grad_scale: 16.0
2024-03-09 13:19:56,275 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 13:20:05,955 INFO [train.py:1029] (0/4) Epoch 5, validation: loss=0.3626, simple_loss=0.3682, pruned_loss=0.1368, over 452978.00 frames. 
2024-03-09 13:20:05,956 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 13:20:37,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=4346.666666666667, ans=0.07283333333333333
2024-03-09 13:20:54,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4413.333333333333, ans=0.29312499999999997
2024-03-09 13:21:12,948 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.13 vs. limit=5.792
2024-03-09 13:21:23,619 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.42 vs. limit=6.12
2024-03-09 13:21:30,491 INFO [train.py:997] (0/4) Epoch 5, batch 50, loss[loss=0.3398, simple_loss=0.3376, pruned_loss=0.1468, over 24103.00 frames. ], tot_loss[loss=0.3685, simple_loss=0.3589, pruned_loss=0.1733, over 1069272.23 frames. ], batch size: 165, lr: 3.64e-02, grad_scale: 8.0
2024-03-09 13:21:32,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4546.666666666667, ans=0.009881159420289855
2024-03-09 13:21:44,554 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=11.74 vs. limit=10.91
2024-03-09 13:21:57,639 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.55 vs. limit=9.23
2024-03-09 13:22:09,206 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.206e+02 1.970e+02 2.387e+02 3.231e+02 6.932e+02, threshold=4.775e+02, percent-clipped=2.0
2024-03-09 13:22:24,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4746.666666666667, ans=0.7338666666666667
2024-03-09 13:22:27,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4746.666666666667, ans=0.27749999999999997
2024-03-09 13:22:35,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4813.333333333333, ans=0.7315333333333334
2024-03-09 13:22:45,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4813.333333333333, ans=0.27437500000000004
2024-03-09 13:22:47,535 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.973e+00
2024-03-09 13:22:55,078 INFO [train.py:997] (0/4) Epoch 5, batch 100, loss[loss=0.3652, simple_loss=0.3619, pruned_loss=0.1618, over 24171.00 frames. ], tot_loss[loss=0.3607, simple_loss=0.3537, pruned_loss=0.1657, over 1883421.09 frames. ], batch size: 295, lr: 3.64e-02, grad_scale: 8.0
2024-03-09 13:22:56,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4880.0, ans=0.2512
2024-03-09 13:23:14,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4946.666666666667, ans=0.25053333333333333
2024-03-09 13:23:21,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4946.666666666667, ans=0.268125
2024-03-09 13:23:25,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4946.666666666667, ans=0.2742
2024-03-09 13:23:46,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5080.0, ans=0.26187499999999997
2024-03-09 13:23:51,679 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.93 vs. limit=11.31
2024-03-09 13:23:52,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5080.0, ans=0.26187499999999997
2024-03-09 13:23:55,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=5080.0, ans=0.26187499999999997
2024-03-09 13:24:05,289 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.21 vs. limit=6.058666666666667
2024-03-09 13:24:11,529 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.22 vs. limit=11.36
2024-03-09 13:24:19,171 INFO [train.py:997] (0/4) Epoch 5, batch 150, loss[loss=0.3082, simple_loss=0.3134, pruned_loss=0.1242, over 23983.00 frames. ], tot_loss[loss=0.3569, simple_loss=0.3522, pruned_loss=0.1606, over 2528382.17 frames. ], batch size: 142, lr: 3.64e-02, grad_scale: 8.0
2024-03-09 13:24:32,004 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-5.pt
2024-03-09 13:25:15,972 INFO [train.py:997] (0/4) Epoch 6, batch 0, loss[loss=0.3086, simple_loss=0.3176, pruned_loss=0.1181, over 24218.00 frames. ], tot_loss[loss=0.3086, simple_loss=0.3176, pruned_loss=0.1181, over 24218.00 frames. ], batch size: 198, lr: 3.40e-02, grad_scale: 16.0
2024-03-09 13:25:15,973 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 13:25:26,278 INFO [train.py:1029] (0/4) Epoch 6, validation: loss=0.3173, simple_loss=0.3385, pruned_loss=0.1003, over 452978.00 frames. 
2024-03-09 13:25:26,279 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 13:25:53,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=5333.333333333333, ans=0.044444444444444446
2024-03-09 13:25:55,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=5333.333333333333, ans=0.25
2024-03-09 13:26:01,735 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.162e+02 1.753e+02 2.102e+02 2.732e+02 4.816e+02, threshold=4.205e+02, percent-clipped=1.0
2024-03-09 13:26:02,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5333.333333333333, ans=0.25
2024-03-09 13:26:16,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5400.0, ans=0.246
2024-03-09 13:26:26,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=5466.666666666667, ans=0.009681159420289855
2024-03-09 13:26:38,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=5533.333333333333, ans=0.7063333333333334
2024-03-09 13:26:56,060 INFO [train.py:997] (0/4) Epoch 6, batch 50, loss[loss=0.2924, simple_loss=0.3065, pruned_loss=0.1054, over 23969.00 frames. ], tot_loss[loss=0.3137, simple_loss=0.3218, pruned_loss=0.1231, over 1071719.59 frames. ], batch size: 142, lr: 3.40e-02, grad_scale: 16.0
2024-03-09 13:26:56,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=5600.0, ans=0.2375
2024-03-09 13:27:26,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5733.333333333333, ans=0.24266666666666667
2024-03-09 13:27:34,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5733.333333333333, ans=0.24266666666666667
2024-03-09 13:27:46,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5800.0, ans=0.22812500000000002
2024-03-09 13:27:55,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=5800.0, ans=0.22812500000000002
2024-03-09 13:28:06,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=5866.666666666667, ans=0.042222222222222223
2024-03-09 13:28:17,513 INFO [train.py:997] (0/4) Epoch 6, batch 100, loss[loss=0.2948, simple_loss=0.3098, pruned_loss=0.1078, over 24268.00 frames. ], tot_loss[loss=0.3142, simple_loss=0.3237, pruned_loss=0.1227, over 1890983.73 frames. ], batch size: 254, lr: 3.40e-02, grad_scale: 8.0
2024-03-09 13:28:26,496 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.55 vs. limit=9.725
2024-03-09 13:28:37,604 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.95 vs. limit=12.0
2024-03-09 13:28:45,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=6000.0, ans=0.21875
2024-03-09 13:28:47,226 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.031e+02 1.395e+02 1.660e+02 2.447e+02 5.591e+02, threshold=3.319e+02, percent-clipped=4.0
2024-03-09 13:29:17,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=6133.333333333333, ans=0.21250000000000002
2024-03-09 13:29:35,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=6200.0, ans=0.20937499999999998
2024-03-09 13:29:40,034 INFO [train.py:997] (0/4) Epoch 6, batch 150, loss[loss=0.2647, simple_loss=0.2819, pruned_loss=0.09391, over 23806.00 frames. ], tot_loss[loss=0.3088, simple_loss=0.3202, pruned_loss=0.1188, over 2528188.92 frames. ], batch size: 129, lr: 3.39e-02, grad_scale: 8.0
2024-03-09 13:29:52,943 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-6.pt
2024-03-09 13:30:37,223 INFO [train.py:997] (0/4) Epoch 7, batch 0, loss[loss=0.2559, simple_loss=0.2774, pruned_loss=0.08407, over 23593.00 frames. ], tot_loss[loss=0.2559, simple_loss=0.2774, pruned_loss=0.08407, over 23593.00 frames. ], batch size: 128, lr: 3.18e-02, grad_scale: 16.0
2024-03-09 13:30:37,224 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 13:30:47,284 INFO [train.py:1029] (0/4) Epoch 7, validation: loss=0.2933, simple_loss=0.3253, pruned_loss=0.08566, over 452978.00 frames. 
2024-03-09 13:30:47,285 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 13:31:20,903 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.93 vs. limit=8.193333333333333
2024-03-09 13:32:16,187 INFO [train.py:997] (0/4) Epoch 7, batch 50, loss[loss=0.2581, simple_loss=0.2825, pruned_loss=0.08409, over 24217.00 frames. ], tot_loss[loss=0.2845, simple_loss=0.3038, pruned_loss=0.1016, over 1055468.09 frames. ], batch size: 229, lr: 3.18e-02, grad_scale: 16.0
2024-03-09 13:32:20,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=6653.333333333333, ans=0.188125
2024-03-09 13:32:27,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=6653.333333333333, ans=0.03894444444444445
2024-03-09 13:32:30,835 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.025e+02 1.360e+02 1.605e+02 1.865e+02 3.683e+02, threshold=3.211e+02, percent-clipped=2.0
2024-03-09 13:32:36,652 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.17 vs. limit=10.02
2024-03-09 13:32:44,490 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.93 vs. limit=6.68
2024-03-09 13:32:51,691 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00
2024-03-09 13:33:05,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=6853.333333333333, ans=0.17875000000000002
2024-03-09 13:33:12,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=6853.333333333333, ans=0.17875000000000002
2024-03-09 13:33:19,168 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.92 vs. limit=10.095
2024-03-09 13:33:36,452 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.86 vs. limit=4.048
2024-03-09 13:33:37,020 INFO [train.py:997] (0/4) Epoch 7, batch 100, loss[loss=0.2914, simple_loss=0.3133, pruned_loss=0.1053, over 24106.00 frames. ], tot_loss[loss=0.2803, simple_loss=0.3019, pruned_loss=0.09807, over 1872137.09 frames. ], batch size: 344, lr: 3.18e-02, grad_scale: 16.0
2024-03-09 13:33:51,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=6986.666666666667, ans=0.6554666666666666
2024-03-09 13:34:06,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=7053.333333333333, ans=0.22946666666666665
2024-03-09 13:34:27,765 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.90 vs. limit=6.796666666666667
2024-03-09 13:34:31,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=7186.666666666667, ans=0.16312500000000002
2024-03-09 13:34:49,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=7253.333333333333, ans=0.15999999999999998
2024-03-09 13:34:55,333 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.36 vs. limit=10.22
2024-03-09 13:34:58,873 INFO [train.py:997] (0/4) Epoch 7, batch 150, loss[loss=0.2432, simple_loss=0.2734, pruned_loss=0.07554, over 23974.00 frames. ], tot_loss[loss=0.2791, simple_loss=0.3024, pruned_loss=0.09717, over 2506566.13 frames. ], batch size: 142, lr: 3.18e-02, grad_scale: 16.0
2024-03-09 13:35:05,428 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.95 vs. limit=10.245000000000001
2024-03-09 13:35:11,783 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-7.pt
2024-03-09 13:35:57,455 INFO [train.py:997] (0/4) Epoch 8, batch 0, loss[loss=0.2711, simple_loss=0.3, pruned_loss=0.0905, over 24243.00 frames. ], tot_loss[loss=0.2711, simple_loss=0.3, pruned_loss=0.0905, over 24243.00 frames. ], batch size: 311, lr: 2.99e-02, grad_scale: 32.0
2024-03-09 13:35:57,456 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 13:36:07,342 INFO [train.py:1029] (0/4) Epoch 8, validation: loss=0.2797, simple_loss=0.3212, pruned_loss=0.07915, over 452978.00 frames. 
2024-03-09 13:36:07,343 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 13:36:08,863 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.023e+02 1.314e+02 1.638e+02 1.955e+02 4.296e+02, threshold=3.277e+02, percent-clipped=3.0
2024-03-09 13:36:38,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=7440.0, ans=9.65
2024-03-09 13:36:56,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=7573.333333333333, ans=0.035111111111111114
2024-03-09 13:37:15,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=7640.0, ans=0.009208695652173913
2024-03-09 13:37:19,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=7640.0, ans=0.009208695652173913
2024-03-09 13:37:28,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=7640.0, ans=0.1
2024-03-09 13:37:28,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=7640.0, ans=0.034833333333333334
2024-03-09 13:37:31,057 INFO [train.py:997] (0/4) Epoch 8, batch 50, loss[loss=0.3296, simple_loss=0.3471, pruned_loss=0.1336, over 23635.00 frames. ], tot_loss[loss=0.264, simple_loss=0.2946, pruned_loss=0.08656, over 1075406.85 frames. ], batch size: 485, lr: 2.99e-02, grad_scale: 32.0
2024-03-09 13:37:39,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=7706.666666666667, ans=0.6302666666666668
2024-03-09 13:37:47,932 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.97 vs. limit=8.886666666666667
2024-03-09 13:37:50,756 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.28 vs. limit=10.415
2024-03-09 13:37:52,316 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.76 vs. limit=6.943333333333333
2024-03-09 13:38:39,073 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.67 vs. limit=10.49
2024-03-09 13:38:51,043 INFO [train.py:997] (0/4) Epoch 8, batch 100, loss[loss=0.2388, simple_loss=0.2735, pruned_loss=0.07445, over 22787.00 frames. ], tot_loss[loss=0.2595, simple_loss=0.2915, pruned_loss=0.08459, over 1880455.62 frames. ], batch size: 85, lr: 2.99e-02, grad_scale: 32.0
2024-03-09 13:38:52,573 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.761e+01 1.115e+02 1.336e+02 1.652e+02 2.844e+02, threshold=2.672e+02, percent-clipped=0.0
2024-03-09 13:38:58,194 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.85 vs. limit=7.01
2024-03-09 13:38:59,614 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.89 vs. limit=7.01
2024-03-09 13:39:06,080 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.95 vs. limit=13.530000000000001
2024-03-09 13:39:14,740 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00
2024-03-09 13:39:16,977 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.81 vs. limit=13.58
2024-03-09 13:39:48,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=8240.0, ans=0.125
2024-03-09 13:39:56,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=8306.666666666666, ans=0.21693333333333334
2024-03-09 13:39:57,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=8306.666666666666, ans=0.6092666666666667
2024-03-09 13:40:00,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=8306.666666666666, ans=0.125
2024-03-09 13:40:08,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=8306.666666666666, ans=0.04949747468305833
2024-03-09 13:40:12,947 INFO [train.py:997] (0/4) Epoch 8, batch 150, loss[loss=0.2394, simple_loss=0.2773, pruned_loss=0.07394, over 24264.00 frames. ], tot_loss[loss=0.2576, simple_loss=0.2911, pruned_loss=0.08367, over 2514481.08 frames. ], batch size: 188, lr: 2.99e-02, grad_scale: 16.0
2024-03-09 13:40:25,406 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-8.pt
2024-03-09 13:41:11,732 INFO [train.py:997] (0/4) Epoch 9, batch 0, loss[loss=0.2605, simple_loss=0.2977, pruned_loss=0.0851, over 24156.00 frames. ], tot_loss[loss=0.2605, simple_loss=0.2977, pruned_loss=0.0851, over 24156.00 frames. ], batch size: 366, lr: 2.83e-02, grad_scale: 32.0
2024-03-09 13:41:11,733 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 13:41:21,825 INFO [train.py:1029] (0/4) Epoch 9, validation: loss=0.2624, simple_loss=0.312, pruned_loss=0.07326, over 452978.00 frames. 
2024-03-09 13:41:21,826 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 13:41:26,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=8426.666666666666, ans=0.125
2024-03-09 13:42:20,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=8626.666666666666, ans=0.16373333333333334
2024-03-09 13:42:41,967 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.295e+01 1.084e+02 1.217e+02 1.477e+02 3.480e+02, threshold=2.433e+02, percent-clipped=5.0
2024-03-09 13:42:49,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=8760.0, ans=0.125
2024-03-09 13:42:51,114 INFO [train.py:997] (0/4) Epoch 9, batch 50, loss[loss=0.2396, simple_loss=0.2865, pruned_loss=0.06842, over 24059.00 frames. ], tot_loss[loss=0.2398, simple_loss=0.2806, pruned_loss=0.0732, over 1069856.85 frames. ], batch size: 365, lr: 2.83e-02, grad_scale: 32.0
2024-03-09 13:43:06,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=8826.666666666666, ans=0.5910666666666667
2024-03-09 13:43:10,009 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00
2024-03-09 13:43:33,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=8893.333333333334, ans=0.008936231884057972
2024-03-09 13:43:39,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=8960.0, ans=0.125
2024-03-09 13:43:52,484 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.15 vs. limit=7.256666666666666
2024-03-09 13:44:03,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=9026.666666666666, ans=0.125
2024-03-09 13:44:05,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=9026.666666666666, ans=0.125
2024-03-09 13:44:08,288 INFO [train.py:997] (0/4) Epoch 9, batch 100, loss[loss=0.2258, simple_loss=0.2734, pruned_loss=0.06407, over 23887.00 frames. ], tot_loss[loss=0.2387, simple_loss=0.2809, pruned_loss=0.07281, over 1888807.17 frames. ], batch size: 129, lr: 2.83e-02, grad_scale: 32.0
2024-03-09 13:44:17,025 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.73 vs. limit=14.32
2024-03-09 13:44:21,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=9093.333333333334, ans=0.5817333333333334
2024-03-09 13:44:27,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=9160.0, ans=0.028500000000000004
2024-03-09 13:44:31,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=9160.0, ans=0.028500000000000004
2024-03-09 13:45:00,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=9293.333333333334, ans=0.125
2024-03-09 13:45:05,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=9293.333333333334, ans=0.04949747468305833
2024-03-09 13:45:07,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=9293.333333333334, ans=0.125
2024-03-09 13:45:20,413 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.522e+01 1.120e+02 1.341e+02 1.607e+02 2.660e+02, threshold=2.681e+02, percent-clipped=5.0
2024-03-09 13:45:27,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=9360.0, ans=0.02766666666666667
2024-03-09 13:45:30,101 INFO [train.py:997] (0/4) Epoch 9, batch 150, loss[loss=0.2241, simple_loss=0.2709, pruned_loss=0.06654, over 24266.00 frames. ], tot_loss[loss=0.2389, simple_loss=0.2821, pruned_loss=0.07342, over 2526261.32 frames. ], batch size: 229, lr: 2.82e-02, grad_scale: 32.0
2024-03-09 13:45:30,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=9426.666666666666, ans=0.027388888888888893
2024-03-09 13:45:42,606 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-9.pt
2024-03-09 13:46:27,250 INFO [train.py:997] (0/4) Epoch 10, batch 0, loss[loss=0.2235, simple_loss=0.2713, pruned_loss=0.06582, over 24276.00 frames. ], tot_loss[loss=0.2235, simple_loss=0.2713, pruned_loss=0.06582, over 24276.00 frames. ], batch size: 254, lr: 2.69e-02, grad_scale: 32.0
2024-03-09 13:46:27,251 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 13:46:37,029 INFO [train.py:1029] (0/4) Epoch 10, validation: loss=0.2538, simple_loss=0.3122, pruned_loss=0.07122, over 452978.00 frames. 
2024-03-09 13:46:37,030 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 13:46:45,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=9480.0, ans=0.125
2024-03-09 13:46:50,615 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.78 vs. limit=11.055
2024-03-09 13:47:23,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=9613.333333333334, ans=0.125
2024-03-09 13:47:24,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=9613.333333333334, ans=0.20386666666666667
2024-03-09 13:47:40,482 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.40 vs. limit=11.129999999999999
2024-03-09 13:47:53,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=9746.666666666666, ans=0.035
2024-03-09 13:48:02,513 INFO [train.py:997] (0/4) Epoch 10, batch 50, loss[loss=0.2043, simple_loss=0.2603, pruned_loss=0.05221, over 24263.00 frames. ], tot_loss[loss=0.2297, simple_loss=0.2784, pruned_loss=0.06936, over 1062874.58 frames. ], batch size: 188, lr: 2.68e-02, grad_scale: 32.0
2024-03-09 13:48:32,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=9946.666666666666, ans=0.20053333333333334
2024-03-09 13:48:37,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=9946.666666666666, ans=0.5518666666666667
2024-03-09 13:48:37,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=9946.666666666666, ans=0.025222222222222226
2024-03-09 13:48:43,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=9946.666666666666, ans=0.125
2024-03-09 13:48:58,432 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.812e+01 1.075e+02 1.246e+02 1.479e+02 2.668e+02, threshold=2.491e+02, percent-clipped=0.0
2024-03-09 13:49:00,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=10013.333333333334, ans=0.0
2024-03-09 13:49:21,953 INFO [train.py:997] (0/4) Epoch 10, batch 100, loss[loss=0.2463, simple_loss=0.2959, pruned_loss=0.08028, over 23787.00 frames. ], tot_loss[loss=0.2255, simple_loss=0.2761, pruned_loss=0.06721, over 1871919.70 frames. ], batch size: 447, lr: 2.68e-02, grad_scale: 32.0
2024-03-09 13:49:23,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=10146.666666666666, ans=0.125
2024-03-09 13:49:48,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=10213.333333333334, ans=0.125
2024-03-09 13:50:06,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=10280.0, ans=0.125
2024-03-09 13:50:29,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=10413.333333333334, ans=0.2
2024-03-09 13:50:33,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=10413.333333333334, ans=0.5355333333333334
2024-03-09 13:50:43,576 INFO [train.py:997] (0/4) Epoch 10, batch 150, loss[loss=0.2148, simple_loss=0.2699, pruned_loss=0.06309, over 23085.00 frames. ], tot_loss[loss=0.2241, simple_loss=0.2763, pruned_loss=0.06681, over 2516959.31 frames. ], batch size: 101, lr: 2.68e-02, grad_scale: 32.0
2024-03-09 13:50:55,789 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-10.pt
2024-03-09 13:51:41,272 INFO [train.py:997] (0/4) Epoch 11, batch 0, loss[loss=0.205, simple_loss=0.262, pruned_loss=0.05741, over 24262.00 frames. ], tot_loss[loss=0.205, simple_loss=0.262, pruned_loss=0.05741, over 24262.00 frames. ], batch size: 208, lr: 2.56e-02, grad_scale: 32.0
2024-03-09 13:51:41,273 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 13:51:51,065 INFO [train.py:1029] (0/4) Epoch 11, validation: loss=0.2397, simple_loss=0.3066, pruned_loss=0.06689, over 452978.00 frames. 
2024-03-09 13:51:51,066 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 13:51:57,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=10533.333333333334, ans=0.125
2024-03-09 13:52:29,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=10666.666666666666, ans=0.19333333333333336
2024-03-09 13:52:38,414 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.688e+01 1.049e+02 1.183e+02 1.464e+02 2.170e+02, threshold=2.365e+02, percent-clipped=0.0
2024-03-09 13:52:41,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=10733.333333333334, ans=0.125
2024-03-09 13:52:46,671 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.99 vs. limit=15.55
2024-03-09 13:52:47,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=10733.333333333334, ans=0.125
2024-03-09 13:52:58,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=10733.333333333334, ans=0.125
2024-03-09 13:53:18,218 INFO [train.py:997] (0/4) Epoch 11, batch 50, loss[loss=0.2075, simple_loss=0.2698, pruned_loss=0.05717, over 24078.00 frames. ], tot_loss[loss=0.2118, simple_loss=0.2702, pruned_loss=0.0609, over 1066971.02 frames. ], batch size: 344, lr: 2.56e-02, grad_scale: 32.0
2024-03-09 13:53:24,185 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.47 vs. limit=4.63
2024-03-09 13:53:29,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=10866.666666666666, ans=0.5196666666666667
2024-03-09 13:53:32,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=10933.333333333334, ans=0.02111111111111111
2024-03-09 13:53:34,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=10933.333333333334, ans=0.125
2024-03-09 13:54:07,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=11066.666666666666, ans=0.5126666666666667
2024-03-09 13:54:38,693 INFO [train.py:997] (0/4) Epoch 11, batch 100, loss[loss=0.1802, simple_loss=0.2444, pruned_loss=0.04494, over 23777.00 frames. ], tot_loss[loss=0.21, simple_loss=0.27, pruned_loss=0.06026, over 1892829.29 frames. ], batch size: 117, lr: 2.55e-02, grad_scale: 32.0
2024-03-09 13:54:51,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=11200.0, ans=0.188
2024-03-09 13:55:14,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=11333.333333333334, ans=0.18666666666666665
2024-03-09 13:55:17,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=11333.333333333334, ans=0.125
2024-03-09 13:55:23,737 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.186e+01 9.979e+01 1.131e+02 1.409e+02 2.515e+02, threshold=2.263e+02, percent-clipped=1.0
2024-03-09 13:55:58,234 INFO [train.py:997] (0/4) Epoch 11, batch 150, loss[loss=0.2023, simple_loss=0.272, pruned_loss=0.05463, over 24240.00 frames. ], tot_loss[loss=0.2106, simple_loss=0.2715, pruned_loss=0.06134, over 2521132.17 frames. ], batch size: 254, lr: 2.55e-02, grad_scale: 32.0
2024-03-09 13:55:59,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=11533.333333333334, ans=0.18466666666666665
2024-03-09 13:56:10,361 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-11.pt
2024-03-09 13:56:55,629 INFO [train.py:997] (0/4) Epoch 12, batch 0, loss[loss=0.1903, simple_loss=0.2616, pruned_loss=0.04806, over 24261.00 frames. ], tot_loss[loss=0.1903, simple_loss=0.2616, pruned_loss=0.04806, over 24261.00 frames. ], batch size: 254, lr: 2.45e-02, grad_scale: 32.0
2024-03-09 13:56:55,630 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 13:57:03,914 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.5088, 3.5142, 3.5759, 2.8665], device='cuda:0')
2024-03-09 13:57:05,244 INFO [train.py:1029] (0/4) Epoch 12, validation: loss=0.2325, simple_loss=0.3061, pruned_loss=0.06737, over 452978.00 frames. 
2024-03-09 13:57:05,244 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 13:57:27,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=11653.333333333334, ans=0.49213333333333337
2024-03-09 13:57:28,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=11653.333333333334, ans=0.018111111111111106
2024-03-09 13:57:44,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=11720.0, ans=0.125
2024-03-09 13:57:48,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=11720.0, ans=0.125
2024-03-09 13:58:10,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=11786.666666666666, ans=0.4874666666666667
2024-03-09 13:58:17,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=11853.333333333334, ans=0.0
2024-03-09 13:58:28,232 INFO [train.py:997] (0/4) Epoch 12, batch 50, loss[loss=0.1965, simple_loss=0.2713, pruned_loss=0.05139, over 24224.00 frames. ], tot_loss[loss=0.1973, simple_loss=0.2653, pruned_loss=0.05485, over 1077039.37 frames. ], batch size: 327, lr: 2.44e-02, grad_scale: 32.0
2024-03-09 13:58:58,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=12053.333333333334, ans=0.125
2024-03-09 13:58:59,612 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.173e+01 9.982e+01 1.112e+02 1.363e+02 2.435e+02, threshold=2.224e+02, percent-clipped=1.0
2024-03-09 13:59:49,525 INFO [train.py:997] (0/4) Epoch 12, batch 100, loss[loss=0.1862, simple_loss=0.2546, pruned_loss=0.0525, over 24235.00 frames. ], tot_loss[loss=0.1959, simple_loss=0.2653, pruned_loss=0.05464, over 1895207.18 frames. ], batch size: 188, lr: 2.44e-02, grad_scale: 32.0
2024-03-09 14:00:02,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=12253.333333333334, ans=0.38380000000000003
2024-03-09 14:00:11,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=12320.0, ans=0.125
2024-03-09 14:00:20,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=12386.666666666666, ans=0.17613333333333334
2024-03-09 14:01:06,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=12520.0, ans=0.008147826086956522
2024-03-09 14:01:06,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=12520.0, ans=0.125
2024-03-09 14:01:09,127 INFO [train.py:997] (0/4) Epoch 12, batch 150, loss[loss=0.1658, simple_loss=0.2463, pruned_loss=0.03773, over 21582.00 frames. ], tot_loss[loss=0.196, simple_loss=0.2664, pruned_loss=0.05543, over 2517775.23 frames. ], batch size: 718, lr: 2.44e-02, grad_scale: 32.0
2024-03-09 14:01:21,367 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-12.pt
2024-03-09 14:02:05,581 INFO [train.py:997] (0/4) Epoch 13, batch 0, loss[loss=0.1865, simple_loss=0.2604, pruned_loss=0.05207, over 24264.00 frames. ], tot_loss[loss=0.1865, simple_loss=0.2604, pruned_loss=0.05207, over 24264.00 frames. ], batch size: 241, lr: 2.34e-02, grad_scale: 32.0
2024-03-09 14:02:05,582 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 14:02:18,484 INFO [train.py:1029] (0/4) Epoch 13, validation: loss=0.2245, simple_loss=0.307, pruned_loss=0.06618, over 452978.00 frames. 
2024-03-09 14:02:18,486 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 14:02:21,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=12640.0, ans=16.98
2024-03-09 14:02:37,308 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.720e+01 1.064e+02 1.199e+02 1.343e+02 2.089e+02, threshold=2.398e+02, percent-clipped=0.0
2024-03-09 14:02:37,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=12706.666666666666, ans=0.013722222222222226
2024-03-09 14:02:51,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=12773.333333333334, ans=0.01344444444444444
2024-03-09 14:03:02,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=12773.333333333334, ans=0.125
2024-03-09 14:03:04,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=12773.333333333334, ans=0.125
2024-03-09 14:03:14,223 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.40 vs. limit=12.315000000000001
2024-03-09 14:03:15,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=12840.0, ans=0.013166666666666667
2024-03-09 14:03:19,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=12840.0, ans=0.125
2024-03-09 14:03:42,232 INFO [train.py:997] (0/4) Epoch 13, batch 50, loss[loss=0.1762, simple_loss=0.2494, pruned_loss=0.0494, over 24227.00 frames. ], tot_loss[loss=0.1848, simple_loss=0.2605, pruned_loss=0.05125, over 1061752.39 frames. ], batch size: 229, lr: 2.34e-02, grad_scale: 32.0
2024-03-09 14:04:12,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=13106.666666666666, ans=0.125
2024-03-09 14:04:23,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=13106.666666666666, ans=0.125
2024-03-09 14:04:55,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=13240.0, ans=0.43660000000000004
2024-03-09 14:04:57,421 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.59 vs. limit=11.620000000000001
2024-03-09 14:05:04,146 INFO [train.py:997] (0/4) Epoch 13, batch 100, loss[loss=0.1854, simple_loss=0.2647, pruned_loss=0.05286, over 24217.00 frames. ], tot_loss[loss=0.1833, simple_loss=0.2606, pruned_loss=0.05093, over 1878942.86 frames. ], batch size: 295, lr: 2.34e-02, grad_scale: 32.0
2024-03-09 14:05:22,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=13373.333333333334, ans=0.125
2024-03-09 14:05:22,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=13373.333333333334, ans=0.125
2024-03-09 14:05:24,891 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.077e+01 1.017e+02 1.138e+02 1.327e+02 1.773e+02, threshold=2.276e+02, percent-clipped=0.0
2024-03-09 14:05:29,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=13373.333333333334, ans=0.0
2024-03-09 14:05:35,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=13440.0, ans=0.125
2024-03-09 14:05:42,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=13440.0, ans=0.010666666666666672
2024-03-09 14:05:42,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=13440.0, ans=0.125
2024-03-09 14:06:10,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=13573.333333333334, ans=0.42493333333333333
2024-03-09 14:06:12,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=13573.333333333334, ans=0.0
2024-03-09 14:06:13,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=13573.333333333334, ans=0.16426666666666667
2024-03-09 14:06:23,139 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.85 vs. limit=8.393333333333334
2024-03-09 14:06:24,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=13640.0, ans=0.125
2024-03-09 14:06:25,353 INFO [train.py:997] (0/4) Epoch 13, batch 150, loss[loss=0.1975, simple_loss=0.2819, pruned_loss=0.05652, over 23794.00 frames. ], tot_loss[loss=0.183, simple_loss=0.2617, pruned_loss=0.05095, over 2509743.01 frames. ], batch size: 447, lr: 2.34e-02, grad_scale: 32.0
2024-03-09 14:06:26,242 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.31 vs. limit=9.456
2024-03-09 14:06:27,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=13640.0, ans=0.42260000000000003
2024-03-09 14:06:31,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=13640.0, ans=0.125
2024-03-09 14:06:33,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=13640.0, ans=0.125
2024-03-09 14:06:37,593 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-13.pt
2024-03-09 14:07:22,760 INFO [train.py:997] (0/4) Epoch 14, batch 0, loss[loss=0.1767, simple_loss=0.2619, pruned_loss=0.04574, over 24148.00 frames. ], tot_loss[loss=0.1767, simple_loss=0.2619, pruned_loss=0.04574, over 24148.00 frames. ], batch size: 345, lr: 2.25e-02, grad_scale: 32.0
2024-03-09 14:07:22,760 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 14:07:32,051 INFO [train.py:1029] (0/4) Epoch 14, validation: loss=0.2172, simple_loss=0.3059, pruned_loss=0.06427, over 452978.00 frames. 
2024-03-09 14:07:32,052 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 14:07:46,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=13760.0, ans=0.00933333333333334
2024-03-09 14:08:38,098 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00
2024-03-09 14:08:48,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=13960.0, ans=0.007834782608695651
2024-03-09 14:08:51,018 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.47 vs. limit=9.584
2024-03-09 14:08:53,286 INFO [train.py:997] (0/4) Epoch 14, batch 50, loss[loss=0.1481, simple_loss=0.2405, pruned_loss=0.02789, over 21469.00 frames. ], tot_loss[loss=0.1787, simple_loss=0.2592, pruned_loss=0.04904, over 1071242.12 frames. ], batch size: 714, lr: 2.25e-02, grad_scale: 32.0
2024-03-09 14:08:59,478 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.009e+01 1.028e+02 1.152e+02 1.303e+02 2.373e+02, threshold=2.304e+02, percent-clipped=1.0
2024-03-09 14:09:23,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=14093.333333333334, ans=0.007805797101449276
2024-03-09 14:09:32,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=14160.0, ans=0.40440000000000004
2024-03-09 14:09:40,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=14226.666666666666, ans=0.125
2024-03-09 14:09:40,652 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.26 vs. limit=18.17
2024-03-09 14:10:12,224 INFO [train.py:997] (0/4) Epoch 14, batch 100, loss[loss=0.1726, simple_loss=0.2566, pruned_loss=0.04432, over 24220.00 frames. ], tot_loss[loss=0.1776, simple_loss=0.2588, pruned_loss=0.04818, over 1885745.77 frames. ], batch size: 241, lr: 2.25e-02, grad_scale: 32.0
2024-03-09 14:10:40,713 INFO [scaling.py:1023] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.39 vs. limit=6.8853333333333335
2024-03-09 14:10:47,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=14493.333333333334, ans=0.125
2024-03-09 14:11:08,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=14560.0, ans=0.125
2024-03-09 14:11:22,764 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.17 vs. limit=9.850666666666665
2024-03-09 14:11:35,856 INFO [train.py:997] (0/4) Epoch 14, batch 150, loss[loss=0.1914, simple_loss=0.2777, pruned_loss=0.05254, over 23993.00 frames. ], tot_loss[loss=0.179, simple_loss=0.2608, pruned_loss=0.04857, over 2514372.92 frames. ], batch size: 388, lr: 2.25e-02, grad_scale: 32.0
2024-03-09 14:11:41,695 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.629e+01 9.664e+01 1.070e+02 1.194e+02 2.380e+02, threshold=2.140e+02, percent-clipped=1.0
2024-03-09 14:11:47,749 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-14.pt
2024-03-09 14:12:33,830 INFO [train.py:997] (0/4) Epoch 15, batch 0, loss[loss=0.1984, simple_loss=0.283, pruned_loss=0.05691, over 23768.00 frames. ], tot_loss[loss=0.1984, simple_loss=0.283, pruned_loss=0.05691, over 23768.00 frames. ], batch size: 486, lr: 2.17e-02, grad_scale: 32.0
2024-03-09 14:12:33,831 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 14:12:40,159 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([1.2567, 2.3136, 2.0556, 2.0763, 2.1980, 2.1469, 2.0836, 2.2145],
       device='cuda:0')
2024-03-09 14:12:43,268 INFO [train.py:1029] (0/4) Epoch 15, validation: loss=0.2144, simple_loss=0.3029, pruned_loss=0.06295, over 452978.00 frames. 
2024-03-09 14:12:43,269 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 14:13:01,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=14813.333333333334, ans=0.125
2024-03-09 14:13:14,630 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.90 vs. limit=8.703333333333333
2024-03-09 14:13:24,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=14880.0, ans=0.3792
2024-03-09 14:14:02,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=15013.333333333334, ans=0.09899494936611666
2024-03-09 14:14:04,899 INFO [train.py:997] (0/4) Epoch 15, batch 50, loss[loss=0.1676, simple_loss=0.2478, pruned_loss=0.04363, over 24107.00 frames. ], tot_loss[loss=0.1756, simple_loss=0.2578, pruned_loss=0.04672, over 1067826.22 frames. ], batch size: 176, lr: 2.17e-02, grad_scale: 32.0
2024-03-09 14:14:08,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=15080.0, ans=0.3722
2024-03-09 14:15:02,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=15280.0, ans=0.125
2024-03-09 14:15:08,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=15346.666666666666, ans=0.125
2024-03-09 14:15:10,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=15346.666666666666, ans=0.125
2024-03-09 14:15:19,050 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.102e+01 1.026e+02 1.164e+02 1.400e+02 2.237e+02, threshold=2.327e+02, percent-clipped=1.0
2024-03-09 14:15:27,120 INFO [train.py:997] (0/4) Epoch 15, batch 100, loss[loss=0.1698, simple_loss=0.2591, pruned_loss=0.04019, over 24259.00 frames. ], tot_loss[loss=0.1748, simple_loss=0.2574, pruned_loss=0.04613, over 1886144.59 frames. ], batch size: 295, lr: 2.17e-02, grad_scale: 32.0
2024-03-09 14:15:30,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=15413.333333333334, ans=0.0024444444444444435
2024-03-09 14:15:54,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=15480.0, ans=0.007504347826086957
2024-03-09 14:15:56,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=15480.0, ans=0.007504347826086957
2024-03-09 14:16:46,435 INFO [train.py:997] (0/4) Epoch 15, batch 150, loss[loss=0.1882, simple_loss=0.2646, pruned_loss=0.05595, over 23887.00 frames. ], tot_loss[loss=0.1737, simple_loss=0.2564, pruned_loss=0.04554, over 2498734.50 frames. ], batch size: 153, lr: 2.16e-02, grad_scale: 32.0
2024-03-09 14:16:58,744 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-15.pt
2024-03-09 14:17:45,381 INFO [train.py:997] (0/4) Epoch 16, batch 0, loss[loss=0.169, simple_loss=0.2606, pruned_loss=0.03866, over 23966.00 frames. ], tot_loss[loss=0.169, simple_loss=0.2606, pruned_loss=0.03866, over 23966.00 frames. ], batch size: 387, lr: 2.09e-02, grad_scale: 32.0
2024-03-09 14:17:45,382 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 14:17:55,604 INFO [train.py:1029] (0/4) Epoch 16, validation: loss=0.2134, simple_loss=0.3039, pruned_loss=0.06146, over 452978.00 frames. 
2024-03-09 14:17:55,604 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 14:18:08,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=15800.0, ans=0.14200000000000002
2024-03-09 14:18:36,826 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.05 vs. limit=13.475
2024-03-09 14:18:51,994 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.08 vs. limit=13.5
2024-03-09 14:19:01,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=16000.0, ans=0.0
2024-03-09 14:19:03,237 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.463e+01 8.632e+01 1.007e+02 1.180e+02 1.868e+02, threshold=2.014e+02, percent-clipped=0.0
2024-03-09 14:19:21,889 INFO [train.py:997] (0/4) Epoch 16, batch 50, loss[loss=0.1638, simple_loss=0.2484, pruned_loss=0.0396, over 24164.00 frames. ], tot_loss[loss=0.1662, simple_loss=0.2508, pruned_loss=0.04081, over 1074508.98 frames. ], batch size: 217, lr: 2.09e-02, grad_scale: 32.0
2024-03-09 14:19:25,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=16133.333333333334, ans=0.0
2024-03-09 14:19:28,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=16133.333333333334, ans=0.125
2024-03-09 14:19:29,359 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.23 vs. limit=19.6
2024-03-09 14:19:39,059 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00
2024-03-09 14:19:39,732 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.66 vs. limit=5.43
2024-03-09 14:20:05,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=16266.666666666666, ans=0.125
2024-03-09 14:20:26,003 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.50 vs. limit=13.2
2024-03-09 14:20:28,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=16400.0, ans=0.0
2024-03-09 14:20:38,766 INFO [train.py:997] (0/4) Epoch 16, batch 100, loss[loss=0.1717, simple_loss=0.2491, pruned_loss=0.04711, over 24232.00 frames. ], tot_loss[loss=0.1677, simple_loss=0.2517, pruned_loss=0.04187, over 1892200.52 frames. ], batch size: 241, lr: 2.09e-02, grad_scale: 32.0
2024-03-09 14:21:16,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=16600.0, ans=0.025
2024-03-09 14:21:36,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=16666.666666666668, ans=0.1333333333333333
2024-03-09 14:21:43,561 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.243e+01 8.931e+01 9.706e+01 1.091e+02 1.368e+02, threshold=1.941e+02, percent-clipped=0.0
2024-03-09 14:22:02,412 INFO [train.py:997] (0/4) Epoch 16, batch 150, loss[loss=0.1989, simple_loss=0.2811, pruned_loss=0.05834, over 23724.00 frames. ], tot_loss[loss=0.1686, simple_loss=0.2528, pruned_loss=0.04221, over 2520276.84 frames. ], batch size: 486, lr: 2.09e-02, grad_scale: 32.0
2024-03-09 14:22:07,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=16800.0, ans=0.132
2024-03-09 14:22:14,613 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-16.pt
2024-03-09 14:23:00,943 INFO [train.py:997] (0/4) Epoch 17, batch 0, loss[loss=0.163, simple_loss=0.2512, pruned_loss=0.03741, over 24204.00 frames. ], tot_loss[loss=0.163, simple_loss=0.2512, pruned_loss=0.03741, over 24204.00 frames. ], batch size: 295, lr: 2.02e-02, grad_scale: 32.0
2024-03-09 14:23:00,943 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 14:23:11,390 INFO [train.py:1029] (0/4) Epoch 17, validation: loss=0.215, simple_loss=0.3066, pruned_loss=0.06175, over 452978.00 frames. 
2024-03-09 14:23:11,391 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 14:23:27,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=16920.0, ans=0.0
2024-03-09 14:23:49,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=16986.666666666668, ans=0.125
2024-03-09 14:23:51,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=16986.666666666668, ans=0.0
2024-03-09 14:24:19,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=17120.0, ans=0.125
2024-03-09 14:24:36,071 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.62 vs. limit=20.39
2024-03-09 14:24:36,337 INFO [train.py:997] (0/4) Epoch 17, batch 50, loss[loss=0.1638, simple_loss=0.2541, pruned_loss=0.03677, over 24265.00 frames. ], tot_loss[loss=0.1667, simple_loss=0.2512, pruned_loss=0.04117, over 1074718.53 frames. ], batch size: 267, lr: 2.02e-02, grad_scale: 32.0
2024-03-09 14:24:59,753 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=7.830e-03
2024-03-09 14:25:04,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=17253.333333333332, ans=0.125
2024-03-09 14:25:22,879 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.440e+01 9.326e+01 1.031e+02 1.175e+02 1.521e+02, threshold=2.062e+02, percent-clipped=0.0
2024-03-09 14:25:37,793 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.89 vs. limit=14.044999999999998
2024-03-09 14:25:49,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=17453.333333333332, ans=0.125
2024-03-09 14:25:56,613 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.95 vs. limit=20.64
2024-03-09 14:25:57,090 INFO [train.py:997] (0/4) Epoch 17, batch 100, loss[loss=0.1625, simple_loss=0.2498, pruned_loss=0.03755, over 24270.00 frames. ], tot_loss[loss=0.1669, simple_loss=0.2516, pruned_loss=0.0411, over 1882719.50 frames. ], batch size: 254, lr: 2.02e-02, grad_scale: 32.0
2024-03-09 14:26:11,731 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.82 vs. limit=5.638
2024-03-09 14:26:21,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=17586.666666666668, ans=0.09899494936611666
2024-03-09 14:26:21,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=17586.666666666668, ans=0.125
2024-03-09 14:27:15,877 INFO [train.py:997] (0/4) Epoch 17, batch 150, loss[loss=0.1671, simple_loss=0.2548, pruned_loss=0.03968, over 24150.00 frames. ], tot_loss[loss=0.1668, simple_loss=0.2514, pruned_loss=0.04109, over 2517570.99 frames. ], batch size: 345, lr: 2.02e-02, grad_scale: 32.0
2024-03-09 14:27:28,466 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-17.pt
2024-03-09 14:28:12,293 INFO [train.py:997] (0/4) Epoch 18, batch 0, loss[loss=0.1544, simple_loss=0.2397, pruned_loss=0.03458, over 24280.00 frames. ], tot_loss[loss=0.1544, simple_loss=0.2397, pruned_loss=0.03458, over 24280.00 frames. ], batch size: 229, lr: 1.96e-02, grad_scale: 32.0
2024-03-09 14:28:12,294 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 14:28:22,756 INFO [train.py:1029] (0/4) Epoch 18, validation: loss=0.213, simple_loss=0.3039, pruned_loss=0.06107, over 452978.00 frames. 
2024-03-09 14:28:22,756 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 14:28:29,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=17906.666666666668, ans=0.125
2024-03-09 14:28:32,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=17906.666666666668, ans=0.0
2024-03-09 14:28:46,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=17973.333333333332, ans=0.125
2024-03-09 14:28:47,088 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.11 vs. limit=13.986666666666666
2024-03-09 14:29:02,462 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.00 vs. limit=9.51
2024-03-09 14:29:02,778 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.169e+01 8.782e+01 9.645e+01 1.059e+02 1.496e+02, threshold=1.929e+02, percent-clipped=0.0
2024-03-09 14:29:10,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=18040.0, ans=0.125
2024-03-09 14:29:11,444 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.53 vs. limit=11.216000000000001
2024-03-09 14:29:22,502 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.49 vs. limit=14.29
2024-03-09 14:29:38,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=18173.333333333332, ans=0.125
2024-03-09 14:29:39,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=18173.333333333332, ans=0.006918840579710145
2024-03-09 14:29:44,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=18240.0, ans=0.125
2024-03-09 14:29:45,631 INFO [train.py:997] (0/4) Epoch 18, batch 50, loss[loss=0.1544, simple_loss=0.2399, pruned_loss=0.03446, over 24260.00 frames. ], tot_loss[loss=0.1633, simple_loss=0.2474, pruned_loss=0.03955, over 1069503.57 frames. ], batch size: 198, lr: 1.96e-02, grad_scale: 32.0
2024-03-09 14:30:20,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=18373.333333333332, ans=0.25693333333333346
2024-03-09 14:30:33,256 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.06 vs. limit=14.415
2024-03-09 14:30:37,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=18440.0, ans=0.0
2024-03-09 14:30:41,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=18440.0, ans=0.0
2024-03-09 14:31:00,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=18506.666666666668, ans=0.006846376811594203
2024-03-09 14:31:06,274 INFO [train.py:997] (0/4) Epoch 18, batch 100, loss[loss=0.163, simple_loss=0.249, pruned_loss=0.03848, over 24302.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.2477, pruned_loss=0.03865, over 1882939.63 frames. ], batch size: 241, lr: 1.96e-02, grad_scale: 32.0
2024-03-09 14:31:15,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=18573.333333333332, ans=0.06426666666666667
2024-03-09 14:31:39,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=18706.666666666668, ans=0.125
2024-03-09 14:31:41,829 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.054e+01 8.645e+01 9.593e+01 1.057e+02 1.559e+02, threshold=1.919e+02, percent-clipped=0.0
2024-03-09 14:31:49,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=18706.666666666668, ans=0.125
2024-03-09 14:31:52,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=18706.666666666668, ans=0.0
2024-03-09 14:31:56,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=18773.333333333332, ans=0.11226666666666668
2024-03-09 14:32:09,675 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=5.8260000000000005
2024-03-09 14:32:26,168 INFO [train.py:997] (0/4) Epoch 18, batch 150, loss[loss=0.1676, simple_loss=0.2505, pruned_loss=0.04236, over 24078.00 frames. ], tot_loss[loss=0.1638, simple_loss=0.2492, pruned_loss=0.03919, over 2521185.23 frames. ], batch size: 165, lr: 1.95e-02, grad_scale: 32.0
2024-03-09 14:32:38,333 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-18.pt
2024-03-09 14:33:23,372 INFO [train.py:997] (0/4) Epoch 19, batch 0, loss[loss=0.1729, simple_loss=0.264, pruned_loss=0.04085, over 24022.00 frames. ], tot_loss[loss=0.1729, simple_loss=0.264, pruned_loss=0.04085, over 24022.00 frames. ], batch size: 416, lr: 1.90e-02, grad_scale: 32.0
2024-03-09 14:33:23,373 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 14:33:30,781 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.4285, 5.0987, 5.3784, 5.1244], device='cuda:0')
2024-03-09 14:33:35,286 INFO [train.py:1029] (0/4) Epoch 19, validation: loss=0.2133, simple_loss=0.3046, pruned_loss=0.061, over 452978.00 frames. 
2024-03-09 14:33:35,287 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 14:33:38,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=18960.0, ans=0.024620000000000003
2024-03-09 14:34:06,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=19026.666666666668, ans=0.125
2024-03-09 14:34:31,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=19160.0, ans=0.10840000000000002
2024-03-09 14:34:35,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=19160.0, ans=0.125
2024-03-09 14:34:46,993 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.30 vs. limit=14.613333333333335
2024-03-09 14:34:51,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=19226.666666666668, ans=0.006689855072463767
2024-03-09 14:34:53,443 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.06 vs. limit=14.71
2024-03-09 14:34:55,506 INFO [train.py:997] (0/4) Epoch 19, batch 50, loss[loss=0.165, simple_loss=0.2532, pruned_loss=0.03836, over 24187.00 frames. ], tot_loss[loss=0.1603, simple_loss=0.2465, pruned_loss=0.03701, over 1071248.87 frames. ], batch size: 295, lr: 1.90e-02, grad_scale: 32.0
2024-03-09 14:34:55,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=19293.333333333332, ans=0.1070666666666667
2024-03-09 14:35:08,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=19293.333333333332, ans=0.00667536231884058
2024-03-09 14:35:08,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=19293.333333333332, ans=0.0
2024-03-09 14:35:17,297 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.127e+01 8.675e+01 9.444e+01 1.046e+02 1.924e+02, threshold=1.889e+02, percent-clipped=1.0
2024-03-09 14:35:42,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=19493.333333333332, ans=0.49239999999999995
2024-03-09 14:35:47,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=19493.333333333332, ans=0.05
2024-03-09 14:35:50,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=19493.333333333332, ans=0.0
2024-03-09 14:35:58,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=19560.0, ans=5.934
2024-03-09 14:36:16,165 INFO [train.py:997] (0/4) Epoch 19, batch 100, loss[loss=0.1576, simple_loss=0.2467, pruned_loss=0.03418, over 24199.00 frames. ], tot_loss[loss=0.1615, simple_loss=0.2474, pruned_loss=0.03783, over 1882403.70 frames. ], batch size: 280, lr: 1.90e-02, grad_scale: 32.0
2024-03-09 14:36:21,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=19626.666666666668, ans=0.10373333333333334
2024-03-09 14:36:27,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=19626.666666666668, ans=0.125
2024-03-09 14:36:32,737 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.18 vs. limit=14.86
2024-03-09 14:36:36,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=19693.333333333332, ans=0.21073333333333344
2024-03-09 14:36:39,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=19693.333333333332, ans=0.0
2024-03-09 14:36:43,199 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.34 vs. limit=9.923333333333332
2024-03-09 14:36:53,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=19760.0, ans=0.10240000000000002
2024-03-09 14:36:54,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=19760.0, ans=0.10240000000000002
2024-03-09 14:37:14,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=19826.666666666668, ans=0.006559420289855072
2024-03-09 14:37:27,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=19893.333333333332, ans=0.0
2024-03-09 14:37:36,535 INFO [train.py:997] (0/4) Epoch 19, batch 150, loss[loss=0.2038, simple_loss=0.281, pruned_loss=0.06329, over 23262.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2488, pruned_loss=0.03824, over 2517229.94 frames. ], batch size: 534, lr: 1.89e-02, grad_scale: 32.0
2024-03-09 14:37:43,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=19960.0, ans=0.125
2024-03-09 14:37:49,377 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-19.pt
2024-03-09 14:38:31,428 INFO [train.py:997] (0/4) Epoch 20, batch 0, loss[loss=0.1662, simple_loss=0.2468, pruned_loss=0.04281, over 24051.00 frames. ], tot_loss[loss=0.1662, simple_loss=0.2468, pruned_loss=0.04281, over 24051.00 frames. ], batch size: 176, lr: 1.85e-02, grad_scale: 32.0
2024-03-09 14:38:31,429 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 14:38:38,128 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.1197, 3.9310, 3.8895, 3.4368], device='cuda:0')
2024-03-09 14:38:40,964 INFO [train.py:1029] (0/4) Epoch 20, validation: loss=0.2111, simple_loss=0.3031, pruned_loss=0.05952, over 452978.00 frames. 
2024-03-09 14:38:40,964 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 14:38:53,196 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.462e+01 8.448e+01 9.307e+01 1.038e+02 2.078e+02, threshold=1.861e+02, percent-clipped=1.0
2024-03-09 14:38:53,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=20013.333333333332, ans=0.1
2024-03-09 14:39:37,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=20213.333333333332, ans=0.125
2024-03-09 14:39:57,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=20280.0, ans=0.05
2024-03-09 14:39:59,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=20280.0, ans=0.0
2024-03-09 14:40:03,546 INFO [train.py:997] (0/4) Epoch 20, batch 50, loss[loss=0.1468, simple_loss=0.2296, pruned_loss=0.03198, over 23582.00 frames. ], tot_loss[loss=0.1552, simple_loss=0.241, pruned_loss=0.0347, over 1076970.64 frames. ], batch size: 128, lr: 1.84e-02, grad_scale: 32.0
2024-03-09 14:41:11,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=20613.333333333332, ans=0.09899494936611666
2024-03-09 14:41:25,658 INFO [train.py:997] (0/4) Epoch 20, batch 100, loss[loss=0.1671, simple_loss=0.2498, pruned_loss=0.04219, over 24104.00 frames. ], tot_loss[loss=0.1591, simple_loss=0.2448, pruned_loss=0.03668, over 1894606.49 frames. ], batch size: 165, lr: 1.84e-02, grad_scale: 32.0
2024-03-09 14:41:34,816 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.448e+01 8.010e+01 8.832e+01 9.695e+01 1.353e+02, threshold=1.766e+02, percent-clipped=0.0
2024-03-09 14:41:40,236 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.63 vs. limit=12.0
2024-03-09 14:41:49,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=20746.666666666668, ans=0.2
2024-03-09 14:42:04,612 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.79 vs. limit=15.0
2024-03-09 14:42:04,942 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.21 vs. limit=15.0
2024-03-09 14:42:07,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=20813.333333333332, ans=0.07
2024-03-09 14:42:36,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=20946.666666666668, ans=0.0
2024-03-09 14:42:44,020 INFO [train.py:997] (0/4) Epoch 20, batch 150, loss[loss=0.1465, simple_loss=0.232, pruned_loss=0.03051, over 24252.00 frames. ], tot_loss[loss=0.1585, simple_loss=0.2445, pruned_loss=0.03627, over 2518931.36 frames. ], batch size: 229, lr: 1.84e-02, grad_scale: 32.0
2024-03-09 14:42:52,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=21013.333333333332, ans=0.125
2024-03-09 14:42:53,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=21013.333333333332, ans=0.006301449275362319
2024-03-09 14:42:56,093 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-20.pt
2024-03-09 14:43:39,749 INFO [train.py:997] (0/4) Epoch 21, batch 0, loss[loss=0.1557, simple_loss=0.2395, pruned_loss=0.03593, over 22561.00 frames. ], tot_loss[loss=0.1557, simple_loss=0.2395, pruned_loss=0.03593, over 22561.00 frames. ], batch size: 85, lr: 1.79e-02, grad_scale: 32.0
2024-03-09 14:43:39,750 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 14:43:49,466 INFO [train.py:1029] (0/4) Epoch 21, validation: loss=0.2106, simple_loss=0.3015, pruned_loss=0.05984, over 452978.00 frames. 
2024-03-09 14:43:49,467 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 14:44:14,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=21133.333333333332, ans=0.2
2024-03-09 14:44:25,168 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.598e-02
2024-03-09 14:44:26,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=21200.0, ans=0.125
2024-03-09 14:44:41,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=21266.666666666668, ans=0.125
2024-03-09 14:45:10,283 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.134e+01 8.236e+01 9.284e+01 1.075e+02 1.651e+02, threshold=1.857e+02, percent-clipped=0.0
2024-03-09 14:45:13,759 INFO [train.py:997] (0/4) Epoch 21, batch 50, loss[loss=0.1334, simple_loss=0.2286, pruned_loss=0.01914, over 21496.00 frames. ], tot_loss[loss=0.1576, simple_loss=0.2452, pruned_loss=0.03499, over 1066137.56 frames. ], batch size: 717, lr: 1.79e-02, grad_scale: 32.0
2024-03-09 14:45:18,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=21400.0, ans=0.125
2024-03-09 14:45:20,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=21400.0, ans=0.1
2024-03-09 14:45:23,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=21400.0, ans=0.1
2024-03-09 14:45:37,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=21466.666666666668, ans=0.0
2024-03-09 14:45:57,078 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.40 vs. limit=15.0
2024-03-09 14:46:00,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=21600.0, ans=0.2
2024-03-09 14:46:29,323 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.74 vs. limit=15.0
2024-03-09 14:46:32,881 INFO [train.py:997] (0/4) Epoch 21, batch 100, loss[loss=0.1936, simple_loss=0.2731, pruned_loss=0.05707, over 23234.00 frames. ], tot_loss[loss=0.1586, simple_loss=0.2461, pruned_loss=0.03551, over 1889581.54 frames. ], batch size: 534, lr: 1.79e-02, grad_scale: 64.0
2024-03-09 14:47:29,119 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.37 vs. limit=22.5
2024-03-09 14:47:35,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=21933.333333333332, ans=0.125
2024-03-09 14:47:48,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=22000.0, ans=0.95
2024-03-09 14:47:51,969 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.744e+01 8.144e+01 8.919e+01 1.026e+02 1.301e+02, threshold=1.784e+02, percent-clipped=0.0
2024-03-09 14:47:55,065 INFO [train.py:997] (0/4) Epoch 21, batch 150, loss[loss=0.133, simple_loss=0.2276, pruned_loss=0.0192, over 21551.00 frames. ], tot_loss[loss=0.1602, simple_loss=0.2472, pruned_loss=0.03658, over 2523300.87 frames. ], batch size: 718, lr: 1.79e-02, grad_scale: 64.0
2024-03-09 14:48:07,327 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-21.pt
2024-03-09 14:48:50,783 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.72 vs. limit=12.0
2024-03-09 14:48:51,238 INFO [train.py:997] (0/4) Epoch 22, batch 0, loss[loss=0.1625, simple_loss=0.2543, pruned_loss=0.03538, over 24015.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.2543, pruned_loss=0.03538, over 24015.00 frames. ], batch size: 416, lr: 1.74e-02, grad_scale: 64.0
2024-03-09 14:48:51,239 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 14:49:00,964 INFO [train.py:1029] (0/4) Epoch 22, validation: loss=0.2117, simple_loss=0.3028, pruned_loss=0.06033, over 452978.00 frames. 
2024-03-09 14:49:00,965 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 14:49:01,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=22120.0, ans=0.1
2024-03-09 14:49:12,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=22120.0, ans=0.2
2024-03-09 14:49:29,886 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.50 vs. limit=15.0
2024-03-09 14:49:56,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=22320.0, ans=0.2
2024-03-09 14:49:57,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=22320.0, ans=0.1
2024-03-09 14:50:07,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=22386.666666666668, ans=0.0
2024-03-09 14:50:23,732 INFO [train.py:997] (0/4) Epoch 22, batch 50, loss[loss=0.1616, simple_loss=0.2404, pruned_loss=0.04134, over 23927.00 frames. ], tot_loss[loss=0.1542, simple_loss=0.2416, pruned_loss=0.0334, over 1068791.91 frames. ], batch size: 153, lr: 1.74e-02, grad_scale: 64.0
2024-03-09 14:50:33,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=22453.333333333332, ans=0.07
2024-03-09 14:50:38,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=22520.0, ans=0.0
2024-03-09 14:51:04,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=22586.666666666668, ans=0.0
2024-03-09 14:51:05,295 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.82 vs. limit=15.0
2024-03-09 14:51:18,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=22653.333333333332, ans=0.125
2024-03-09 14:51:28,016 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.973e+01 8.132e+01 8.918e+01 9.986e+01 1.265e+02, threshold=1.784e+02, percent-clipped=0.0
2024-03-09 14:51:45,175 INFO [train.py:997] (0/4) Epoch 22, batch 100, loss[loss=0.1544, simple_loss=0.2491, pruned_loss=0.02986, over 24063.00 frames. ], tot_loss[loss=0.1541, simple_loss=0.2412, pruned_loss=0.03344, over 1880311.48 frames. ], batch size: 365, lr: 1.74e-02, grad_scale: 64.0
2024-03-09 14:51:48,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=22786.666666666668, ans=0.125
2024-03-09 14:51:49,463 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.12 vs. limit=15.0
2024-03-09 14:52:05,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=22853.333333333332, ans=0.005901449275362319
2024-03-09 14:52:20,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=22920.0, ans=0.1
2024-03-09 14:52:28,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=22920.0, ans=0.125
2024-03-09 14:52:29,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=22920.0, ans=0.125
2024-03-09 14:52:37,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=22986.666666666668, ans=0.0
2024-03-09 14:52:52,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=23053.333333333332, ans=0.0
2024-03-09 14:52:52,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=23053.333333333332, ans=0.5
2024-03-09 14:53:00,363 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.62 vs. limit=6.0
2024-03-09 14:53:01,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=23053.333333333332, ans=0.005857971014492754
2024-03-09 14:53:05,950 INFO [train.py:997] (0/4) Epoch 22, batch 150, loss[loss=0.1542, simple_loss=0.2448, pruned_loss=0.03185, over 24194.00 frames. ], tot_loss[loss=0.1545, simple_loss=0.2424, pruned_loss=0.03335, over 2516578.38 frames. ], batch size: 241, lr: 1.74e-02, grad_scale: 64.0
2024-03-09 14:53:15,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=23120.0, ans=0.125
2024-03-09 14:53:18,515 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-22.pt
2024-03-09 14:54:00,138 INFO [train.py:997] (0/4) Epoch 23, batch 0, loss[loss=0.1547, simple_loss=0.2325, pruned_loss=0.03842, over 20296.00 frames. ], tot_loss[loss=0.1547, simple_loss=0.2325, pruned_loss=0.03842, over 20296.00 frames. ], batch size: 60, lr: 1.70e-02, grad_scale: 64.0
2024-03-09 14:54:00,139 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 14:54:09,892 INFO [train.py:1029] (0/4) Epoch 23, validation: loss=0.2115, simple_loss=0.3036, pruned_loss=0.0597, over 452978.00 frames. 
2024-03-09 14:54:09,893 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 14:55:00,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=23373.333333333332, ans=0.125
2024-03-09 14:55:05,176 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.526e+01 7.783e+01 8.704e+01 9.596e+01 1.275e+02, threshold=1.741e+02, percent-clipped=0.0
2024-03-09 14:55:07,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=23373.333333333332, ans=0.125
2024-03-09 14:55:33,121 INFO [train.py:997] (0/4) Epoch 23, batch 50, loss[loss=0.1242, simple_loss=0.2224, pruned_loss=0.01296, over 21644.00 frames. ], tot_loss[loss=0.1508, simple_loss=0.2378, pruned_loss=0.03189, over 1055970.27 frames. ], batch size: 718, lr: 1.70e-02, grad_scale: 64.0
2024-03-09 14:55:35,060 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00
2024-03-09 14:55:50,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=23573.333333333332, ans=0.1
2024-03-09 14:55:53,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=23573.333333333332, ans=0.125
2024-03-09 14:56:01,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=23573.333333333332, ans=0.0
2024-03-09 14:56:07,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=23640.0, ans=0.0
2024-03-09 14:56:09,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=23640.0, ans=0.05
2024-03-09 14:56:51,640 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.48 vs. limit=15.0
2024-03-09 14:56:53,727 INFO [train.py:997] (0/4) Epoch 23, batch 100, loss[loss=0.1374, simple_loss=0.227, pruned_loss=0.02393, over 23991.00 frames. ], tot_loss[loss=0.1535, simple_loss=0.2411, pruned_loss=0.03292, over 1873457.87 frames. ], batch size: 142, lr: 1.69e-02, grad_scale: 64.0
2024-03-09 14:57:00,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=23840.0, ans=0.125
2024-03-09 14:57:02,393 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.81 vs. limit=15.0
2024-03-09 14:57:09,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=23906.666666666668, ans=0.125
2024-03-09 14:57:45,468 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.240e+01 7.813e+01 8.574e+01 9.589e+01 1.326e+02, threshold=1.715e+02, percent-clipped=0.0
2024-03-09 14:58:13,656 INFO [train.py:997] (0/4) Epoch 23, batch 150, loss[loss=0.1579, simple_loss=0.2442, pruned_loss=0.03576, over 24253.00 frames. ], tot_loss[loss=0.1534, simple_loss=0.2413, pruned_loss=0.03276, over 2510509.13 frames. ], batch size: 198, lr: 1.69e-02, grad_scale: 64.0
2024-03-09 14:58:25,925 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-23.pt
2024-03-09 14:59:06,778 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.44 vs. limit=22.5
2024-03-09 14:59:07,185 INFO [train.py:997] (0/4) Epoch 24, batch 0, loss[loss=0.143, simple_loss=0.2308, pruned_loss=0.02763, over 20364.00 frames. ], tot_loss[loss=0.143, simple_loss=0.2308, pruned_loss=0.02763, over 20364.00 frames. ], batch size: 60, lr: 1.66e-02, grad_scale: 64.0
2024-03-09 14:59:07,185 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 14:59:16,706 INFO [train.py:1029] (0/4) Epoch 24, validation: loss=0.2123, simple_loss=0.3043, pruned_loss=0.06014, over 452978.00 frames. 
2024-03-09 14:59:16,707 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 14:59:49,533 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00
2024-03-09 14:59:51,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=24293.333333333332, ans=0.005588405797101449
2024-03-09 14:59:54,097 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00
2024-03-09 14:59:54,866 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.96 vs. limit=12.0
2024-03-09 15:00:26,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=24493.333333333332, ans=0.005544927536231884
2024-03-09 15:00:43,104 INFO [train.py:997] (0/4) Epoch 24, batch 50, loss[loss=0.1556, simple_loss=0.245, pruned_loss=0.03309, over 24205.00 frames. ], tot_loss[loss=0.1508, simple_loss=0.2378, pruned_loss=0.03194, over 1073196.93 frames. ], batch size: 295, lr: 1.65e-02, grad_scale: 64.0
2024-03-09 15:00:48,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=24560.0, ans=0.125
2024-03-09 15:00:52,062 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.04 vs. limit=10.0
2024-03-09 15:01:20,106 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.514e+01 7.866e+01 8.423e+01 9.105e+01 1.243e+02, threshold=1.685e+02, percent-clipped=0.0
2024-03-09 15:01:34,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=24760.0, ans=0.2
2024-03-09 15:02:03,700 INFO [train.py:997] (0/4) Epoch 24, batch 100, loss[loss=0.1498, simple_loss=0.2407, pruned_loss=0.02947, over 24119.00 frames. ], tot_loss[loss=0.1522, simple_loss=0.2396, pruned_loss=0.03236, over 1880690.09 frames. ], batch size: 345, lr: 1.65e-02, grad_scale: 64.0
2024-03-09 15:02:12,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=24893.333333333332, ans=0.04949747468305833
2024-03-09 15:02:31,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=24960.0, ans=0.0
2024-03-09 15:02:48,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=25026.666666666668, ans=0.125
2024-03-09 15:02:55,393 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.58 vs. limit=22.5
2024-03-09 15:02:56,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=25093.333333333332, ans=0.2
2024-03-09 15:03:24,739 INFO [train.py:997] (0/4) Epoch 24, batch 150, loss[loss=0.1926, simple_loss=0.2715, pruned_loss=0.05689, over 23321.00 frames. ], tot_loss[loss=0.1531, simple_loss=0.2406, pruned_loss=0.03286, over 2517082.11 frames. ], batch size: 534, lr: 1.65e-02, grad_scale: 64.0
2024-03-09 15:03:36,211 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-24.pt
2024-03-09 15:04:17,871 INFO [train.py:997] (0/4) Epoch 25, batch 0, loss[loss=0.1555, simple_loss=0.2383, pruned_loss=0.03632, over 23989.00 frames. ], tot_loss[loss=0.1555, simple_loss=0.2383, pruned_loss=0.03632, over 23989.00 frames. ], batch size: 165, lr: 1.61e-02, grad_scale: 64.0
2024-03-09 15:04:17,872 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 15:04:27,731 INFO [train.py:1029] (0/4) Epoch 25, validation: loss=0.2123, simple_loss=0.3048, pruned_loss=0.05995, over 452978.00 frames. 
2024-03-09 15:04:27,732 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 15:04:56,153 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.291e+01 7.825e+01 8.498e+01 9.317e+01 1.197e+02, threshold=1.700e+02, percent-clipped=0.0
2024-03-09 15:05:24,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=25480.0, ans=0.0
2024-03-09 15:05:31,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=25546.666666666668, ans=0.1
2024-03-09 15:05:35,922 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.12 vs. limit=6.0
2024-03-09 15:05:38,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=25546.666666666668, ans=0.0
2024-03-09 15:05:45,983 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.51 vs. limit=15.0
2024-03-09 15:05:50,869 INFO [train.py:997] (0/4) Epoch 25, batch 50, loss[loss=0.1952, simple_loss=0.2743, pruned_loss=0.05803, over 23275.00 frames. ], tot_loss[loss=0.1525, simple_loss=0.2401, pruned_loss=0.0325, over 1057002.68 frames. ], batch size: 534, lr: 1.61e-02, grad_scale: 64.0
2024-03-09 15:06:08,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=25680.0, ans=15.0
2024-03-09 15:06:16,559 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.09 vs. limit=15.0
2024-03-09 15:07:06,212 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.74 vs. limit=15.0
2024-03-09 15:07:11,204 INFO [train.py:997] (0/4) Epoch 25, batch 100, loss[loss=0.1584, simple_loss=0.2518, pruned_loss=0.03244, over 23996.00 frames. ], tot_loss[loss=0.1517, simple_loss=0.2394, pruned_loss=0.03203, over 1879158.15 frames. ], batch size: 416, lr: 1.61e-02, grad_scale: 64.0
2024-03-09 15:07:22,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=25946.666666666668, ans=0.125
2024-03-09 15:07:37,664 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.239e+01 7.935e+01 8.679e+01 9.503e+01 1.168e+02, threshold=1.736e+02, percent-clipped=0.0
2024-03-09 15:07:50,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=26080.0, ans=0.0
2024-03-09 15:07:58,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=26146.666666666668, ans=0.07
2024-03-09 15:08:31,506 INFO [train.py:997] (0/4) Epoch 25, batch 150, loss[loss=0.1308, simple_loss=0.2132, pruned_loss=0.02421, over 23677.00 frames. ], tot_loss[loss=0.1503, simple_loss=0.2382, pruned_loss=0.03123, over 2512466.32 frames. ], batch size: 116, lr: 1.61e-02, grad_scale: 64.0
2024-03-09 15:08:43,626 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-25.pt
2024-03-09 15:09:26,510 INFO [train.py:997] (0/4) Epoch 26, batch 0, loss[loss=0.148, simple_loss=0.2348, pruned_loss=0.03061, over 24281.00 frames. ], tot_loss[loss=0.148, simple_loss=0.2348, pruned_loss=0.03061, over 24281.00 frames. ], batch size: 281, lr: 1.58e-02, grad_scale: 64.0
2024-03-09 15:09:26,510 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 15:09:35,915 INFO [train.py:1029] (0/4) Epoch 26, validation: loss=0.2091, simple_loss=0.3013, pruned_loss=0.05842, over 452978.00 frames. 
2024-03-09 15:09:35,915 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 15:09:51,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=26333.333333333332, ans=0.05
2024-03-09 15:09:59,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=26400.0, ans=0.005130434782608696
2024-03-09 15:10:04,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=26400.0, ans=0.125
2024-03-09 15:10:18,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=26466.666666666668, ans=0.125
2024-03-09 15:10:46,303 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.24 vs. limit=10.0
2024-03-09 15:10:55,261 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/checkpoint-4000.pt
2024-03-09 15:10:59,541 INFO [train.py:997] (0/4) Epoch 26, batch 50, loss[loss=0.1583, simple_loss=0.2526, pruned_loss=0.032, over 24015.00 frames. ], tot_loss[loss=0.1477, simple_loss=0.2356, pruned_loss=0.02994, over 1071984.71 frames. ], batch size: 388, lr: 1.57e-02, grad_scale: 64.0
2024-03-09 15:11:01,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=26666.666666666668, ans=0.125
2024-03-09 15:11:09,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=26666.666666666668, ans=0.125
2024-03-09 15:11:11,922 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.387e+01 7.632e+01 8.183e+01 8.952e+01 1.265e+02, threshold=1.637e+02, percent-clipped=0.0
2024-03-09 15:12:19,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=26933.333333333332, ans=0.005014492753623189
2024-03-09 15:12:20,295 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.47 vs. limit=15.0
2024-03-09 15:12:21,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=27000.0, ans=0.125
2024-03-09 15:12:22,491 INFO [train.py:997] (0/4) Epoch 26, batch 100, loss[loss=0.1537, simple_loss=0.2503, pruned_loss=0.02859, over 24072.00 frames. ], tot_loss[loss=0.1487, simple_loss=0.2372, pruned_loss=0.03011, over 1874992.90 frames. ], batch size: 416, lr: 1.57e-02, grad_scale: 64.0
2024-03-09 15:13:02,494 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.10 vs. limit=15.0
2024-03-09 15:13:03,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=27133.333333333332, ans=0.125
2024-03-09 15:13:20,909 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.68 vs. limit=12.0
2024-03-09 15:13:26,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=27266.666666666668, ans=0.1
2024-03-09 15:13:41,515 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.24 vs. limit=15.0
2024-03-09 15:13:42,259 INFO [train.py:997] (0/4) Epoch 26, batch 150, loss[loss=0.1536, simple_loss=0.2487, pruned_loss=0.02921, over 23951.00 frames. ], tot_loss[loss=0.1493, simple_loss=0.2381, pruned_loss=0.0303, over 2521242.32 frames. ], batch size: 416, lr: 1.57e-02, grad_scale: 64.0
2024-03-09 15:13:49,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=27333.333333333332, ans=0.125
2024-03-09 15:13:55,059 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-26.pt
2024-03-09 15:14:38,730 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 5.880e+01 7.565e+01 8.210e+01 9.162e+01 1.256e+02, threshold=1.642e+02, percent-clipped=0.0
2024-03-09 15:14:38,763 INFO [train.py:997] (0/4) Epoch 27, batch 0, loss[loss=0.1575, simple_loss=0.2397, pruned_loss=0.03766, over 23932.00 frames. ], tot_loss[loss=0.1575, simple_loss=0.2397, pruned_loss=0.03766, over 23932.00 frames. ], batch size: 153, lr: 1.54e-02, grad_scale: 64.0
2024-03-09 15:14:38,763 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 15:14:48,406 INFO [train.py:1029] (0/4) Epoch 27, validation: loss=0.2114, simple_loss=0.3031, pruned_loss=0.05987, over 452978.00 frames. 
2024-03-09 15:14:48,406 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 15:15:39,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=27520.0, ans=0.125
2024-03-09 15:15:51,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=27586.666666666668, ans=0.0
2024-03-09 15:15:54,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=27586.666666666668, ans=0.2
2024-03-09 15:16:04,701 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.76 vs. limit=10.0
2024-03-09 15:16:14,483 INFO [train.py:997] (0/4) Epoch 27, batch 50, loss[loss=0.1505, simple_loss=0.2362, pruned_loss=0.03237, over 24217.00 frames. ], tot_loss[loss=0.1516, simple_loss=0.2384, pruned_loss=0.03242, over 1078106.10 frames. ], batch size: 241, lr: 1.54e-02, grad_scale: 64.0
2024-03-09 15:16:17,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=27720.0, ans=0.125
2024-03-09 15:16:24,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=27720.0, ans=0.125
2024-03-09 15:16:25,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=27720.0, ans=0.07
2024-03-09 15:16:39,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=27786.666666666668, ans=0.004828985507246377
2024-03-09 15:16:50,030 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.75 vs. limit=10.0
2024-03-09 15:17:06,124 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00
2024-03-09 15:17:33,762 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.709e+01 7.734e+01 8.550e+01 9.615e+01 1.355e+02, threshold=1.710e+02, percent-clipped=0.0
2024-03-09 15:17:33,799 INFO [train.py:997] (0/4) Epoch 27, batch 100, loss[loss=0.1466, simple_loss=0.2324, pruned_loss=0.03042, over 23679.00 frames. ], tot_loss[loss=0.1491, simple_loss=0.2369, pruned_loss=0.03063, over 1897391.58 frames. ], batch size: 129, lr: 1.53e-02, grad_scale: 64.0
2024-03-09 15:17:44,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=28053.333333333332, ans=0.125
2024-03-09 15:18:03,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=28120.0, ans=0.025
2024-03-09 15:18:24,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=28253.333333333332, ans=0.0
2024-03-09 15:18:35,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=28253.333333333332, ans=0.2
2024-03-09 15:18:50,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=28320.0, ans=0.125
2024-03-09 15:18:54,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=28386.666666666668, ans=0.125
2024-03-09 15:18:55,715 INFO [train.py:997] (0/4) Epoch 27, batch 150, loss[loss=0.1433, simple_loss=0.2289, pruned_loss=0.02883, over 23245.00 frames. ], tot_loss[loss=0.1495, simple_loss=0.2383, pruned_loss=0.03039, over 2524252.50 frames. ], batch size: 102, lr: 1.53e-02, grad_scale: 64.0
2024-03-09 15:19:08,946 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-27.pt
2024-03-09 15:19:49,041 INFO [train.py:997] (0/4) Epoch 28, batch 0, loss[loss=0.1464, simple_loss=0.2333, pruned_loss=0.02969, over 24248.00 frames. ], tot_loss[loss=0.1464, simple_loss=0.2333, pruned_loss=0.02969, over 24248.00 frames. ], batch size: 188, lr: 1.50e-02, grad_scale: 64.0
2024-03-09 15:19:49,042 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 15:19:59,332 INFO [train.py:1029] (0/4) Epoch 28, validation: loss=0.2107, simple_loss=0.3034, pruned_loss=0.05903, over 452978.00 frames. 
2024-03-09 15:19:59,333 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 15:20:54,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=28640.0, ans=0.2
2024-03-09 15:21:11,008 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.458e+01 7.529e+01 8.136e+01 8.999e+01 1.198e+02, threshold=1.627e+02, percent-clipped=0.0
2024-03-09 15:21:23,147 INFO [train.py:997] (0/4) Epoch 28, batch 50, loss[loss=0.1528, simple_loss=0.238, pruned_loss=0.03385, over 24062.00 frames. ], tot_loss[loss=0.1489, simple_loss=0.2361, pruned_loss=0.03084, over 1059671.48 frames. ], batch size: 176, lr: 1.50e-02, grad_scale: 64.0
2024-03-09 15:21:27,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=28773.333333333332, ans=0.1
2024-03-09 15:21:56,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=28906.666666666668, ans=0.004585507246376811
2024-03-09 15:22:02,501 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.92 vs. limit=15.0
2024-03-09 15:22:23,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=28973.333333333332, ans=0.125
2024-03-09 15:22:25,560 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.45 vs. limit=22.5
2024-03-09 15:22:43,071 INFO [train.py:997] (0/4) Epoch 28, batch 100, loss[loss=0.1415, simple_loss=0.2314, pruned_loss=0.02578, over 23388.00 frames. ], tot_loss[loss=0.1471, simple_loss=0.2356, pruned_loss=0.02928, over 1873994.08 frames. ], batch size: 102, lr: 1.50e-02, grad_scale: 64.0
2024-03-09 15:23:34,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=29306.666666666668, ans=0.004498550724637681
2024-03-09 15:23:50,078 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.379e+01 7.431e+01 8.104e+01 8.725e+01 1.109e+02, threshold=1.621e+02, percent-clipped=0.0
2024-03-09 15:24:02,913 INFO [train.py:997] (0/4) Epoch 28, batch 150, loss[loss=0.1391, simple_loss=0.2318, pruned_loss=0.02323, over 24072.00 frames. ], tot_loss[loss=0.1469, simple_loss=0.236, pruned_loss=0.02888, over 2513280.61 frames. ], batch size: 344, lr: 1.50e-02, grad_scale: 64.0
2024-03-09 15:24:10,027 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00
2024-03-09 15:24:15,504 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-28.pt
2024-03-09 15:24:57,644 INFO [train.py:997] (0/4) Epoch 29, batch 0, loss[loss=0.1661, simple_loss=0.2594, pruned_loss=0.03642, over 23744.00 frames. ], tot_loss[loss=0.1661, simple_loss=0.2594, pruned_loss=0.03642, over 23744.00 frames. ], batch size: 486, lr: 1.47e-02, grad_scale: 64.0
2024-03-09 15:24:57,644 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 15:25:06,829 INFO [train.py:1029] (0/4) Epoch 29, validation: loss=0.2094, simple_loss=0.3019, pruned_loss=0.05844, over 452978.00 frames. 
2024-03-09 15:25:06,829 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 15:25:25,786 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00
2024-03-09 15:25:35,099 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0
2024-03-09 15:26:11,366 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00
2024-03-09 15:26:32,419 INFO [train.py:997] (0/4) Epoch 29, batch 50, loss[loss=0.16, simple_loss=0.2447, pruned_loss=0.03766, over 23922.00 frames. ], tot_loss[loss=0.1437, simple_loss=0.2346, pruned_loss=0.0264, over 1069248.24 frames. ], batch size: 153, lr: 1.47e-02, grad_scale: 64.0
2024-03-09 15:26:59,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=29893.333333333332, ans=0.125
2024-03-09 15:27:10,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=29960.0, ans=0.125
2024-03-09 15:27:27,042 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.485e+01 7.617e+01 8.419e+01 9.074e+01 1.218e+02, threshold=1.684e+02, percent-clipped=0.0
2024-03-09 15:27:34,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=30093.333333333332, ans=0.0
2024-03-09 15:27:38,565 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.31 vs. limit=15.0
2024-03-09 15:27:41,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=30093.333333333332, ans=0.125
2024-03-09 15:27:41,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=30093.333333333332, ans=0.125
2024-03-09 15:27:55,006 INFO [train.py:997] (0/4) Epoch 29, batch 100, loss[loss=0.1432, simple_loss=0.239, pruned_loss=0.02375, over 24013.00 frames. ], tot_loss[loss=0.1462, simple_loss=0.2367, pruned_loss=0.02781, over 1887373.78 frames. ], batch size: 388, lr: 1.47e-02, grad_scale: 64.0
2024-03-09 15:27:58,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=30160.0, ans=0.0
2024-03-09 15:28:25,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=30293.333333333332, ans=0.1
2024-03-09 15:28:39,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=30293.333333333332, ans=0.125
2024-03-09 15:28:48,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=30360.0, ans=0.1
2024-03-09 15:28:59,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=30426.666666666668, ans=0.125
2024-03-09 15:29:12,925 INFO [train.py:997] (0/4) Epoch 29, batch 150, loss[loss=0.1253, simple_loss=0.2077, pruned_loss=0.02143, over 23872.00 frames. ], tot_loss[loss=0.146, simple_loss=0.2357, pruned_loss=0.02818, over 2524369.47 frames. ], batch size: 117, lr: 1.46e-02, grad_scale: 64.0
2024-03-09 15:29:19,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=30493.333333333332, ans=0.0
2024-03-09 15:29:24,886 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-29.pt
2024-03-09 15:30:06,218 INFO [train.py:997] (0/4) Epoch 30, batch 0, loss[loss=0.1409, simple_loss=0.2265, pruned_loss=0.02767, over 24189.00 frames. ], tot_loss[loss=0.1409, simple_loss=0.2265, pruned_loss=0.02767, over 24189.00 frames. ], batch size: 217, lr: 1.44e-02, grad_scale: 64.0
2024-03-09 15:30:06,219 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 15:30:13,365 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6786, 4.0017, 4.5676, 3.9307], device='cuda:0')
2024-03-09 15:30:18,510 INFO [train.py:1029] (0/4) Epoch 30, validation: loss=0.2105, simple_loss=0.3027, pruned_loss=0.05915, over 452978.00 frames. 
2024-03-09 15:30:18,511 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 15:30:46,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=30613.333333333332, ans=0.004214492753623188
2024-03-09 15:30:54,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=30680.0, ans=0.1
2024-03-09 15:31:01,626 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.221e+01 6.992e+01 7.523e+01 8.232e+01 1.586e+02, threshold=1.505e+02, percent-clipped=0.0
2024-03-09 15:31:15,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=30746.666666666668, ans=0.2
2024-03-09 15:31:40,991 INFO [train.py:997] (0/4) Epoch 30, batch 50, loss[loss=0.1512, simple_loss=0.2498, pruned_loss=0.02631, over 23747.00 frames. ], tot_loss[loss=0.144, simple_loss=0.2324, pruned_loss=0.02777, over 1075031.15 frames. ], batch size: 447, lr: 1.44e-02, grad_scale: 64.0
2024-03-09 15:31:42,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=30880.0, ans=0.2
2024-03-09 15:32:14,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=31013.333333333332, ans=0.125
2024-03-09 15:32:18,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=31013.333333333332, ans=0.2
2024-03-09 15:32:48,602 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.20 vs. limit=6.0
2024-03-09 15:33:01,491 INFO [train.py:997] (0/4) Epoch 30, batch 100, loss[loss=0.1258, simple_loss=0.2124, pruned_loss=0.01958, over 24093.00 frames. ], tot_loss[loss=0.1452, simple_loss=0.234, pruned_loss=0.02825, over 1888671.92 frames. ], batch size: 142, lr: 1.43e-02, grad_scale: 64.0
2024-03-09 15:33:12,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=31213.333333333332, ans=0.0
2024-03-09 15:33:24,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=31280.0, ans=0.125
2024-03-09 15:33:31,688 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.08 vs. limit=22.5
2024-03-09 15:33:43,907 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.080e+01 7.332e+01 7.826e+01 8.661e+01 1.231e+02, threshold=1.565e+02, percent-clipped=0.0
2024-03-09 15:34:20,969 INFO [train.py:997] (0/4) Epoch 30, batch 150, loss[loss=0.1436, simple_loss=0.2333, pruned_loss=0.027, over 24222.00 frames. ], tot_loss[loss=0.1449, simple_loss=0.2337, pruned_loss=0.02802, over 2520301.04 frames. ], batch size: 241, lr: 1.43e-02, grad_scale: 64.0
2024-03-09 15:34:30,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=31546.666666666668, ans=0.0
2024-03-09 15:34:33,217 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-30.pt
2024-03-09 15:34:38,240 INFO [train.py:1248] (0/4) Done!