# python3 -m espnet2.bin.asr_train --use_preprocessor true --bpemodel data/token_list/bpe_unigram2000/bpe.model --token_type bpe --token_list data/token_list/bpe_unigram2000/tokens.txt --non_linguistic_symbols none --cleaner none --g2p none --valid_data_path_and_name_and_type dump/raw/dev/wav.scp,speech,sound --valid_shape_file exp/asr_stats_raw_bpe2000_sp/valid/speech_shape --resume true --ignore_init_mismatch false --fold_length 80000 --output_dir exp/asr_train_raw_bpe2000_sp --config conf/train.yaml --frontend_conf fs=16k --normalize=global_mvn --normalize_conf stats_file=exp/asr_stats_raw_bpe2000_sp/train/feats_stats.npz --train_data_path_and_name_and_type dump/raw/train_sp/wav.scp,speech,sound --train_shape_file exp/asr_stats_raw_bpe2000_sp/train/speech_shape --fold_length 150 --train_data_path_and_name_and_type dump/raw/train_sp/text,text,text --train_shape_file exp/asr_stats_raw_bpe2000_sp/train/text_shape.bpe --valid_data_path_and_name_and_type dump/raw/dev/text,text,text --valid_shape_file exp/asr_stats_raw_bpe2000_sp/valid/text_shape.bpe --ngpu 1 --multiprocessing_distributed True # Started at Sun May 14 23:09:44 CST 2023 # /mnt/bd/khassan-volume3/tools/espent_KSC_recipe_test/tools/miniconda/envs/espnet/bin/python3 /mnt/bd/khassan-volume3/tools/espent_KSC_recipe_test/espnet2/bin/asr_train.py --use_preprocessor true --bpemodel data/token_list/bpe_unigram2000/bpe.model --token_type bpe --token_list data/token_list/bpe_unigram2000/tokens.txt --non_linguistic_symbols none --cleaner none --g2p none --valid_data_path_and_name_and_type dump/raw/dev/wav.scp,speech,sound --valid_shape_file exp/asr_stats_raw_bpe2000_sp/valid/speech_shape --resume true --ignore_init_mismatch false --fold_length 80000 --output_dir exp/asr_train_raw_bpe2000_sp --config conf/train.yaml --frontend_conf fs=16k --normalize=global_mvn --normalize_conf stats_file=exp/asr_stats_raw_bpe2000_sp/train/feats_stats.npz --train_data_path_and_name_and_type dump/raw/train_sp/wav.scp,speech,sound --train_shape_file exp/asr_stats_raw_bpe2000_sp/train/speech_shape --fold_length 150 --train_data_path_and_name_and_type dump/raw/train_sp/text,text,text --train_shape_file exp/asr_stats_raw_bpe2000_sp/train/text_shape.bpe --valid_data_path_and_name_and_type dump/raw/dev/text,text,text --valid_shape_file exp/asr_stats_raw_bpe2000_sp/valid/text_shape.bpe --ngpu 1 --multiprocessing_distributed True [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:51,285 (asr:500) INFO: Vocabulary size: 2000 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,010 (initialize:88) INFO: Initialize encoder.embed.conv.0.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,011 (initialize:88) INFO: Initialize encoder.embed.conv.2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,011 (initialize:88) INFO: Initialize encoder.embed.out.0.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,011 (initialize:88) INFO: Initialize encoder.encoders.0.self_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,011 (initialize:88) INFO: Initialize encoder.encoders.0.self_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,011 (initialize:88) INFO: Initialize encoder.encoders.0.self_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,011 (initialize:88) INFO: Initialize encoder.encoders.0.self_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,011 (initialize:88) INFO: Initialize encoder.encoders.0.feed_forward.w_1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,011 (initialize:88) INFO: Initialize encoder.encoders.0.feed_forward.w_2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,011 (initialize:88) INFO: Initialize encoder.encoders.0.norm1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,011 (initialize:88) INFO: Initialize encoder.encoders.0.norm2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,011 (initialize:88) INFO: Initialize encoder.encoders.1.self_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,011 (initialize:88) INFO: Initialize encoder.encoders.1.self_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,011 (initialize:88) INFO: Initialize encoder.encoders.1.self_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,011 (initialize:88) INFO: Initialize encoder.encoders.1.self_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,011 (initialize:88) INFO: Initialize encoder.encoders.1.feed_forward.w_1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,011 (initialize:88) INFO: Initialize encoder.encoders.1.feed_forward.w_2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,011 (initialize:88) INFO: Initialize encoder.encoders.1.norm1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,011 (initialize:88) INFO: Initialize encoder.encoders.1.norm2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,011 (initialize:88) INFO: Initialize encoder.encoders.2.self_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,011 (initialize:88) INFO: Initialize encoder.encoders.2.self_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,011 (initialize:88) INFO: Initialize encoder.encoders.2.self_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,011 (initialize:88) INFO: Initialize encoder.encoders.2.self_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,011 (initialize:88) INFO: Initialize encoder.encoders.2.feed_forward.w_1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,011 (initialize:88) INFO: Initialize encoder.encoders.2.feed_forward.w_2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,011 (initialize:88) INFO: Initialize encoder.encoders.2.norm1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,011 (initialize:88) INFO: Initialize encoder.encoders.2.norm2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,011 (initialize:88) INFO: Initialize encoder.encoders.3.self_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,011 (initialize:88) INFO: Initialize encoder.encoders.3.self_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,011 (initialize:88) INFO: Initialize encoder.encoders.3.self_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,011 (initialize:88) INFO: Initialize encoder.encoders.3.self_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,012 (initialize:88) INFO: Initialize encoder.encoders.3.feed_forward.w_1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,012 (initialize:88) INFO: Initialize encoder.encoders.3.feed_forward.w_2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,012 (initialize:88) INFO: Initialize encoder.encoders.3.norm1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,012 (initialize:88) INFO: Initialize encoder.encoders.3.norm2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,012 (initialize:88) INFO: Initialize encoder.encoders.4.self_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,012 (initialize:88) INFO: Initialize encoder.encoders.4.self_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,012 (initialize:88) INFO: Initialize encoder.encoders.4.self_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,012 (initialize:88) INFO: Initialize encoder.encoders.4.self_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,012 (initialize:88) INFO: Initialize encoder.encoders.4.feed_forward.w_1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,012 (initialize:88) INFO: Initialize encoder.encoders.4.feed_forward.w_2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,012 (initialize:88) INFO: Initialize encoder.encoders.4.norm1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,012 (initialize:88) INFO: Initialize encoder.encoders.4.norm2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,012 (initialize:88) INFO: Initialize encoder.encoders.5.self_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,012 (initialize:88) INFO: Initialize encoder.encoders.5.self_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,012 (initialize:88) INFO: Initialize encoder.encoders.5.self_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,012 (initialize:88) INFO: Initialize encoder.encoders.5.self_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,012 (initialize:88) INFO: Initialize encoder.encoders.5.feed_forward.w_1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,012 (initialize:88) INFO: Initialize encoder.encoders.5.feed_forward.w_2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,012 (initialize:88) INFO: Initialize encoder.encoders.5.norm1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,012 (initialize:88) INFO: Initialize encoder.encoders.5.norm2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,012 (initialize:88) INFO: Initialize encoder.encoders.6.self_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,012 (initialize:88) INFO: Initialize encoder.encoders.6.self_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,012 (initialize:88) INFO: Initialize encoder.encoders.6.self_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,012 (initialize:88) INFO: Initialize encoder.encoders.6.self_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,012 (initialize:88) INFO: Initialize encoder.encoders.6.feed_forward.w_1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,012 (initialize:88) INFO: Initialize encoder.encoders.6.feed_forward.w_2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,012 (initialize:88) INFO: Initialize encoder.encoders.6.norm1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,012 (initialize:88) INFO: Initialize encoder.encoders.6.norm2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,012 (initialize:88) INFO: Initialize encoder.encoders.7.self_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,012 (initialize:88) INFO: Initialize encoder.encoders.7.self_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,012 (initialize:88) INFO: Initialize encoder.encoders.7.self_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,012 (initialize:88) INFO: Initialize encoder.encoders.7.self_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,013 (initialize:88) INFO: Initialize encoder.encoders.7.feed_forward.w_1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,013 (initialize:88) INFO: Initialize encoder.encoders.7.feed_forward.w_2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,013 (initialize:88) INFO: Initialize encoder.encoders.7.norm1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,013 (initialize:88) INFO: Initialize encoder.encoders.7.norm2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,013 (initialize:88) INFO: Initialize encoder.encoders.8.self_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,013 (initialize:88) INFO: Initialize encoder.encoders.8.self_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,013 (initialize:88) INFO: Initialize encoder.encoders.8.self_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,013 (initialize:88) INFO: Initialize encoder.encoders.8.self_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,013 (initialize:88) INFO: Initialize encoder.encoders.8.feed_forward.w_1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,013 (initialize:88) INFO: Initialize encoder.encoders.8.feed_forward.w_2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,013 (initialize:88) INFO: Initialize encoder.encoders.8.norm1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,013 (initialize:88) INFO: Initialize encoder.encoders.8.norm2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,013 (initialize:88) INFO: Initialize encoder.encoders.9.self_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,013 (initialize:88) INFO: Initialize encoder.encoders.9.self_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,013 (initialize:88) INFO: Initialize encoder.encoders.9.self_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,013 (initialize:88) INFO: Initialize encoder.encoders.9.self_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,013 (initialize:88) INFO: Initialize encoder.encoders.9.feed_forward.w_1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,013 (initialize:88) INFO: Initialize encoder.encoders.9.feed_forward.w_2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,013 (initialize:88) INFO: Initialize encoder.encoders.9.norm1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,013 (initialize:88) INFO: Initialize encoder.encoders.9.norm2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,013 (initialize:88) INFO: Initialize encoder.encoders.10.self_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,013 (initialize:88) INFO: Initialize encoder.encoders.10.self_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,013 (initialize:88) INFO: Initialize encoder.encoders.10.self_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,013 (initialize:88) INFO: Initialize encoder.encoders.10.self_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,013 (initialize:88) INFO: Initialize encoder.encoders.10.feed_forward.w_1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,013 (initialize:88) INFO: Initialize encoder.encoders.10.feed_forward.w_2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,013 (initialize:88) INFO: Initialize encoder.encoders.10.norm1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,013 (initialize:88) INFO: Initialize encoder.encoders.10.norm2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,013 (initialize:88) INFO: Initialize encoder.encoders.11.self_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,013 (initialize:88) INFO: Initialize encoder.encoders.11.self_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,013 (initialize:88) INFO: Initialize encoder.encoders.11.self_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,013 (initialize:88) INFO: Initialize encoder.encoders.11.self_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,014 (initialize:88) INFO: Initialize encoder.encoders.11.feed_forward.w_1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,014 (initialize:88) INFO: Initialize encoder.encoders.11.feed_forward.w_2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,014 (initialize:88) INFO: Initialize encoder.encoders.11.norm1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,014 (initialize:88) INFO: Initialize encoder.encoders.11.norm2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,014 (initialize:88) INFO: Initialize encoder.after_norm.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,014 (initialize:88) INFO: Initialize decoder.after_norm.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,014 (initialize:88) INFO: Initialize decoder.output_layer.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,014 (initialize:88) INFO: Initialize decoder.decoders.0.self_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,014 (initialize:88) INFO: Initialize decoder.decoders.0.self_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,014 (initialize:88) INFO: Initialize decoder.decoders.0.self_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,014 (initialize:88) INFO: Initialize decoder.decoders.0.self_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,014 (initialize:88) INFO: Initialize decoder.decoders.0.src_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,014 (initialize:88) INFO: Initialize decoder.decoders.0.src_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,014 (initialize:88) INFO: Initialize decoder.decoders.0.src_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,014 (initialize:88) INFO: Initialize decoder.decoders.0.src_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,014 (initialize:88) INFO: Initialize decoder.decoders.0.feed_forward.w_1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,014 (initialize:88) INFO: Initialize decoder.decoders.0.feed_forward.w_2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,014 (initialize:88) INFO: Initialize decoder.decoders.0.norm1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,014 (initialize:88) INFO: Initialize decoder.decoders.0.norm2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,014 (initialize:88) INFO: Initialize decoder.decoders.0.norm3.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,014 (initialize:88) INFO: Initialize decoder.decoders.1.self_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,014 (initialize:88) INFO: Initialize decoder.decoders.1.self_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,014 (initialize:88) INFO: Initialize decoder.decoders.1.self_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,014 (initialize:88) INFO: Initialize decoder.decoders.1.self_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,014 (initialize:88) INFO: Initialize decoder.decoders.1.src_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,014 (initialize:88) INFO: Initialize decoder.decoders.1.src_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,014 (initialize:88) INFO: Initialize decoder.decoders.1.src_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,014 (initialize:88) INFO: Initialize decoder.decoders.1.src_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,014 (initialize:88) INFO: Initialize decoder.decoders.1.feed_forward.w_1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,014 (initialize:88) INFO: Initialize decoder.decoders.1.feed_forward.w_2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,014 (initialize:88) INFO: Initialize decoder.decoders.1.norm1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,014 (initialize:88) INFO: Initialize decoder.decoders.1.norm2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,014 (initialize:88) INFO: Initialize decoder.decoders.1.norm3.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,015 (initialize:88) INFO: Initialize decoder.decoders.2.self_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,015 (initialize:88) INFO: Initialize decoder.decoders.2.self_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,015 (initialize:88) INFO: Initialize decoder.decoders.2.self_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,015 (initialize:88) INFO: Initialize decoder.decoders.2.self_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,015 (initialize:88) INFO: Initialize decoder.decoders.2.src_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,015 (initialize:88) INFO: Initialize decoder.decoders.2.src_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,015 (initialize:88) INFO: Initialize decoder.decoders.2.src_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,015 (initialize:88) INFO: Initialize decoder.decoders.2.src_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,015 (initialize:88) INFO: Initialize decoder.decoders.2.feed_forward.w_1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,015 (initialize:88) INFO: Initialize decoder.decoders.2.feed_forward.w_2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,015 (initialize:88) INFO: Initialize decoder.decoders.2.norm1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,015 (initialize:88) INFO: Initialize decoder.decoders.2.norm2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,015 (initialize:88) INFO: Initialize decoder.decoders.2.norm3.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,015 (initialize:88) INFO: Initialize decoder.decoders.3.self_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,015 (initialize:88) INFO: Initialize decoder.decoders.3.self_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,015 (initialize:88) INFO: Initialize decoder.decoders.3.self_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,015 (initialize:88) INFO: Initialize decoder.decoders.3.self_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,015 (initialize:88) INFO: Initialize decoder.decoders.3.src_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,015 (initialize:88) INFO: Initialize decoder.decoders.3.src_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,015 (initialize:88) INFO: Initialize decoder.decoders.3.src_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,015 (initialize:88) INFO: Initialize decoder.decoders.3.src_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,015 (initialize:88) INFO: Initialize decoder.decoders.3.feed_forward.w_1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,015 (initialize:88) INFO: Initialize decoder.decoders.3.feed_forward.w_2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,015 (initialize:88) INFO: Initialize decoder.decoders.3.norm1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,015 (initialize:88) INFO: Initialize decoder.decoders.3.norm2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,015 (initialize:88) INFO: Initialize decoder.decoders.3.norm3.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,015 (initialize:88) INFO: Initialize decoder.decoders.4.self_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,015 (initialize:88) INFO: Initialize decoder.decoders.4.self_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,015 (initialize:88) INFO: Initialize decoder.decoders.4.self_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,015 (initialize:88) INFO: Initialize decoder.decoders.4.self_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,015 (initialize:88) INFO: Initialize decoder.decoders.4.src_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,015 (initialize:88) INFO: Initialize decoder.decoders.4.src_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,015 (initialize:88) INFO: Initialize decoder.decoders.4.src_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,015 (initialize:88) INFO: Initialize decoder.decoders.4.src_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,016 (initialize:88) INFO: Initialize decoder.decoders.4.feed_forward.w_1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,016 (initialize:88) INFO: Initialize decoder.decoders.4.feed_forward.w_2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,016 (initialize:88) INFO: Initialize decoder.decoders.4.norm1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,016 (initialize:88) INFO: Initialize decoder.decoders.4.norm2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,016 (initialize:88) INFO: Initialize decoder.decoders.4.norm3.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,016 (initialize:88) INFO: Initialize decoder.decoders.5.self_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,016 (initialize:88) INFO: Initialize decoder.decoders.5.self_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,016 (initialize:88) INFO: Initialize decoder.decoders.5.self_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,016 (initialize:88) INFO: Initialize decoder.decoders.5.self_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,016 (initialize:88) INFO: Initialize decoder.decoders.5.src_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,016 (initialize:88) INFO: Initialize decoder.decoders.5.src_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,016 (initialize:88) INFO: Initialize decoder.decoders.5.src_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,016 (initialize:88) INFO: Initialize decoder.decoders.5.src_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,016 (initialize:88) INFO: Initialize decoder.decoders.5.feed_forward.w_1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,016 (initialize:88) INFO: Initialize decoder.decoders.5.feed_forward.w_2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,016 (initialize:88) INFO: Initialize decoder.decoders.5.norm1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,016 (initialize:88) INFO: Initialize decoder.decoders.5.norm2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,016 (initialize:88) INFO: Initialize decoder.decoders.5.norm3.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:52,016 (initialize:88) INFO: Initialize ctc.ctc_lo.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:54,036 (abs_task:1201) INFO: pytorch.version=1.13.1, cuda.available=True, cudnn.version=8500, cudnn.benchmark=False, cudnn.deterministic=True [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:54,041 (abs_task:1202) INFO: Model structure: ESPnetASRModel( (frontend): DefaultFrontend( (stft): Stft(n_fft=512, win_length=512, hop_length=128, center=True, normalized=False, onesided=True) (frontend): Frontend() (logmel): LogMel(sr=16000, n_fft=512, n_mels=80, fmin=0, fmax=8000.0, htk=False) ) (specaug): SpecAug( (time_warp): TimeWarp(window=5, mode=bicubic) (freq_mask): MaskAlongAxis(mask_width_range=[0, 27], num_mask=2, axis=freq) (time_mask): MaskAlongAxisVariableMaxWidth(mask_width_ratio_range=[0.0, 0.05], num_mask=10, axis=time) ) (normalize): GlobalMVN(stats_file=exp/asr_stats_raw_bpe2000_sp/train/feats_stats.npz, norm_means=True, norm_vars=True) (encoder): TransformerEncoder( (embed): Conv2dSubsampling( (conv): Sequential( (0): Conv2d(1, 256, kernel_size=(3, 3), stride=(2, 2)) (1): ReLU() (2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2)) (3): ReLU() ) (out): Sequential( (0): Linear(in_features=4864, out_features=256, bias=True) (1): PositionalEncoding( (dropout): Dropout(p=0.1, inplace=False) ) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (feed_forward): PositionwiseFeedForward( (w_1): Linear(in_features=256, out_features=2048, bias=True) (w_2): Linear(in_features=2048, out_features=256, bias=True) (dropout): Dropout(p=0.1, inplace=False) (activation): ReLU() ) (norm1): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (feed_forward): PositionwiseFeedForward( (w_1): Linear(in_features=256, out_features=2048, bias=True) (w_2): Linear(in_features=2048, out_features=256, bias=True) (dropout): Dropout(p=0.1, inplace=False) (activation): ReLU() ) (norm1): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (feed_forward): PositionwiseFeedForward( (w_1): Linear(in_features=256, out_features=2048, bias=True) (w_2): Linear(in_features=2048, out_features=256, bias=True) (dropout): Dropout(p=0.1, inplace=False) (activation): ReLU() ) (norm1): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (feed_forward): PositionwiseFeedForward( (w_1): Linear(in_features=256, out_features=2048, bias=True) (w_2): Linear(in_features=2048, out_features=256, bias=True) (dropout): Dropout(p=0.1, inplace=False) (activation): ReLU() ) (norm1): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (4): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (feed_forward): PositionwiseFeedForward( (w_1): Linear(in_features=256, out_features=2048, bias=True) (w_2): Linear(in_features=2048, out_features=256, bias=True) (dropout): Dropout(p=0.1, inplace=False) (activation): ReLU() ) (norm1): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (5): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (feed_forward): PositionwiseFeedForward( (w_1): Linear(in_features=256, out_features=2048, bias=True) (w_2): Linear(in_features=2048, out_features=256, bias=True) (dropout): Dropout(p=0.1, inplace=False) (activation): ReLU() ) (norm1): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (6): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (feed_forward): PositionwiseFeedForward( (w_1): Linear(in_features=256, out_features=2048, bias=True) (w_2): Linear(in_features=2048, out_features=256, bias=True) (dropout): Dropout(p=0.1, inplace=False) (activation): ReLU() ) (norm1): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (7): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (feed_forward): PositionwiseFeedForward( (w_1): Linear(in_features=256, out_features=2048, bias=True) (w_2): Linear(in_features=2048, out_features=256, bias=True) (dropout): Dropout(p=0.1, inplace=False) (activation): ReLU() ) (norm1): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (8): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (feed_forward): PositionwiseFeedForward( (w_1): Linear(in_features=256, out_features=2048, bias=True) (w_2): Linear(in_features=2048, out_features=256, bias=True) (dropout): Dropout(p=0.1, inplace=False) (activation): ReLU() ) (norm1): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (9): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (feed_forward): PositionwiseFeedForward( (w_1): Linear(in_features=256, out_features=2048, bias=True) (w_2): Linear(in_features=2048, out_features=256, bias=True) (dropout): Dropout(p=0.1, inplace=False) (activation): ReLU() ) (norm1): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (10): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (feed_forward): PositionwiseFeedForward( (w_1): Linear(in_features=256, out_features=2048, bias=True) (w_2): Linear(in_features=2048, out_features=256, bias=True) (dropout): Dropout(p=0.1, inplace=False) (activation): ReLU() ) (norm1): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (11): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (feed_forward): PositionwiseFeedForward( (w_1): Linear(in_features=256, out_features=2048, bias=True) (w_2): Linear(in_features=2048, out_features=256, bias=True) (dropout): Dropout(p=0.1, inplace=False) (activation): ReLU() ) (norm1): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (after_norm): LayerNorm((256,), eps=1e-12, elementwise_affine=True) ) (decoder): TransformerDecoder( (embed): Sequential( (0): Embedding(2000, 256) (1): PositionalEncoding( (dropout): Dropout(p=0.1, inplace=False) ) ) (after_norm): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (output_layer): Linear(in_features=256, out_features=2000, bias=True) (decoders): MultiSequential( (0): DecoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (src_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (feed_forward): PositionwiseFeedForward( (w_1): Linear(in_features=256, out_features=2048, bias=True) (w_2): Linear(in_features=2048, out_features=256, bias=True) (dropout): Dropout(p=0.1, inplace=False) (activation): ReLU() ) (norm1): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm3): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): DecoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (src_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (feed_forward): PositionwiseFeedForward( (w_1): Linear(in_features=256, out_features=2048, bias=True) (w_2): Linear(in_features=2048, out_features=256, bias=True) (dropout): Dropout(p=0.1, inplace=False) (activation): ReLU() ) (norm1): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm3): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (2): DecoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (src_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (feed_forward): PositionwiseFeedForward( (w_1): Linear(in_features=256, out_features=2048, bias=True) (w_2): Linear(in_features=2048, out_features=256, bias=True) (dropout): Dropout(p=0.1, inplace=False) (activation): ReLU() ) (norm1): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm3): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (3): DecoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (src_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (feed_forward): PositionwiseFeedForward( (w_1): Linear(in_features=256, out_features=2048, bias=True) (w_2): Linear(in_features=2048, out_features=256, bias=True) (dropout): Dropout(p=0.1, inplace=False) (activation): ReLU() ) (norm1): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm3): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (4): DecoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (src_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (feed_forward): PositionwiseFeedForward( (w_1): Linear(in_features=256, out_features=2048, bias=True) (w_2): Linear(in_features=2048, out_features=256, bias=True) (dropout): Dropout(p=0.1, inplace=False) (activation): ReLU() ) (norm1): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm3): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (5): DecoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (src_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (feed_forward): PositionwiseFeedForward( (w_1): Linear(in_features=256, out_features=2048, bias=True) (w_2): Linear(in_features=2048, out_features=256, bias=True) (dropout): Dropout(p=0.1, inplace=False) (activation): ReLU() ) (norm1): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm3): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (criterion_att): LabelSmoothingLoss( (criterion): KLDivLoss() ) (ctc): CTC( (ctc_lo): Linear(in_features=256, out_features=2000, bias=True) (ctc_loss): CTCLoss() ) ) Model summary: Class Name: ESPnetASRModel Total Number of model parameters: 28.63 M Number of trainable parameters: 28.63 M (100.0%) Size: 114.53 MB Type: torch.float32 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:54,041 (abs_task:1205) INFO: Optimizer: Adam ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: False initial_lr: 0.0001 lr: 3.3333333333333334e-09 maximize: False weight_decay: 0 ) [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:54,041 (abs_task:1206) INFO: Scheduler: WarmupLR(warmup_steps=30000) [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:54,042 (abs_task:1215) INFO: Saving the configuration in exp/asr_train_raw_bpe2000_sp/config.yaml [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:55,353 (asr:471) INFO: Optional Data Names: ('text_spk2', 'text_spk3', 'text_spk4') [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:57,906 (abs_task:1570) INFO: [train] dataset: ESPnetDataset( speech: {"path": "dump/raw/train_sp/wav.scp", "type": "sound"} text: {"path": "dump/raw/train_sp/text", "type": "text"} preprocess: ) [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:57,906 (abs_task:1571) INFO: [train] Batch sampler: FoldedBatchSampler(N-batch=7161, batch_size=128, shape_files=['exp/asr_stats_raw_bpe2000_sp/train/speech_shape', 'exp/asr_stats_raw_bpe2000_sp/train/text_shape.bpe'], sort_in_batch=descending, sort_batch=descending) [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:57,907 (abs_task:1572) INFO: [train] mini-batch sizes summary: N-batch=7161, mean=61.7, min=12, max=128 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:57,952 (asr:471) INFO: Optional Data Names: ('text_spk2', 'text_spk3', 'text_spk4') [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:57,968 (abs_task:1570) INFO: [valid] dataset: ESPnetDataset( speech: {"path": "dump/raw/dev/wav.scp", "type": "sound"} text: {"path": "dump/raw/dev/text", "type": "text"} preprocess: ) [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:57,968 (abs_task:1571) INFO: [valid] Batch sampler: FoldedBatchSampler(N-batch=53, batch_size=128, shape_files=['exp/asr_stats_raw_bpe2000_sp/valid/speech_shape', 'exp/asr_stats_raw_bpe2000_sp/valid/text_shape.bpe'], sort_in_batch=descending, sort_batch=descending) [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:57,968 (abs_task:1572) INFO: [valid] mini-batch sizes summary: N-batch=53, mean=61.9, min=29, max=128 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:57,976 (asr:471) INFO: Optional Data Names: ('text_spk2', 'text_spk3', 'text_spk4') [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:57,999 (abs_task:1570) INFO: [plot_att] dataset: ESPnetDataset( speech: {"path": "dump/raw/dev/wav.scp", "type": "sound"} text: {"path": "dump/raw/dev/text", "type": "text"} preprocess: ) [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:57,999 (abs_task:1571) INFO: [plot_att] Batch sampler: UnsortedBatchSampler(N-batch=3283, batch_size=1, key_file=exp/asr_stats_raw_bpe2000_sp/valid/speech_shape, [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:57,999 (abs_task:1572) INFO: [plot_att] mini-batch sizes summary: N-batch=3, mean=1.0, min=1, max=1 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:58,864 (trainer:159) INFO: The training was resumed using exp/asr_train_raw_bpe2000_sp/checkpoint.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:09:59,150 (trainer:284) INFO: 73/100epoch started [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:11:37,222 (trainer:732) INFO: 73epoch:train:1-358batch: iter_time=0.017, forward_time=0.107, loss_ctc=32.989, loss_att=16.308, acc=0.841, loss=21.313, backward_time=0.048, grad_norm=184.869, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.412e-05, train_time=0.273 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:13:10,529 (trainer:732) INFO: 73epoch:train:359-716batch: iter_time=0.011, forward_time=0.099, loss_ctc=34.275, loss_att=16.934, acc=0.839, loss=22.136, backward_time=0.047, grad_norm=185.802, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.411e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:14:43,225 (trainer:732) INFO: 73epoch:train:717-1074batch: iter_time=0.012, forward_time=0.098, loss_ctc=34.147, loss_att=16.918, acc=0.842, loss=22.087, backward_time=0.047, grad_norm=183.355, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.411e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:16:15,562 (trainer:732) INFO: 73epoch:train:1075-1432batch: iter_time=0.008, forward_time=0.100, loss_ctc=34.276, loss_att=16.972, acc=0.840, loss=22.163, backward_time=0.047, grad_norm=185.418, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.410e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:17:48,127 (trainer:732) INFO: 73epoch:train:1433-1790batch: iter_time=0.009, forward_time=0.101, loss_ctc=32.625, loss_att=16.075, acc=0.843, loss=21.040, backward_time=0.047, grad_norm=181.739, clip=100.000, loss_scale=690.771, optim_step_time=0.033, optim0_lr0=2.409e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:18:20,490 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:19:20,699 (trainer:732) INFO: 73epoch:train:1791-2148batch: iter_time=0.008, forward_time=0.102, loss_ctc=34.698, loss_att=17.184, acc=0.840, loss=22.438, backward_time=0.047, grad_norm=190.913, clip=100.000, loss_scale=691.272, optim_step_time=0.032, optim0_lr0=2.408e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:20:53,396 (trainer:732) INFO: 73epoch:train:2149-2506batch: iter_time=0.005, forward_time=0.104, loss_ctc=34.747, loss_att=17.155, acc=0.839, loss=22.432, backward_time=0.047, grad_norm=193.690, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.407e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:21:00,940 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:22:26,820 (trainer:732) INFO: 73epoch:train:2507-2864batch: iter_time=0.008, forward_time=0.104, loss_ctc=32.554, loss_att=16.095, acc=0.843, loss=21.033, backward_time=0.047, grad_norm=184.312, clip=100.000, loss_scale=275.361, optim_step_time=0.033, optim0_lr0=2.406e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:23:45,750 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:24:01,130 (trainer:732) INFO: 73epoch:train:2865-3222batch: iter_time=0.008, forward_time=0.105, loss_ctc=33.201, loss_att=16.511, acc=0.842, loss=21.518, backward_time=0.047, grad_norm=186.271, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.406e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:25:35,025 (trainer:732) INFO: 73epoch:train:3223-3580batch: iter_time=0.010, forward_time=0.103, loss_ctc=31.259, loss_att=15.449, acc=0.845, loss=20.192, backward_time=0.047, grad_norm=182.266, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.405e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:27:00,670 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:27:08,025 (trainer:732) INFO: 73epoch:train:3581-3938batch: iter_time=0.006, forward_time=0.104, loss_ctc=33.680, loss_att=16.606, acc=0.841, loss=21.728, backward_time=0.047, grad_norm=185.984, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.404e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:28:41,656 (trainer:732) INFO: 73epoch:train:3939-4296batch: iter_time=0.011, forward_time=0.102, loss_ctc=32.415, loss_att=16.006, acc=0.843, loss=20.929, backward_time=0.047, grad_norm=183.181, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.403e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:30:15,512 (trainer:732) INFO: 73epoch:train:4297-4654batch: iter_time=0.006, forward_time=0.105, loss_ctc=33.946, loss_att=16.840, acc=0.840, loss=21.972, backward_time=0.048, grad_norm=188.723, clip=100.000, loss_scale=341.810, optim_step_time=0.033, optim0_lr0=2.402e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:30:29,688 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:30:47,569 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:31:49,202 (trainer:732) INFO: 73epoch:train:4655-5012batch: iter_time=0.006, forward_time=0.105, loss_ctc=34.806, loss_att=17.235, acc=0.837, loss=22.506, backward_time=0.048, grad_norm=197.161, clip=100.000, loss_scale=294.723, optim_step_time=0.033, optim0_lr0=2.401e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:33:23,728 (trainer:732) INFO: 73epoch:train:5013-5370batch: iter_time=0.011, forward_time=0.104, loss_ctc=32.688, loss_att=16.149, acc=0.842, loss=21.111, backward_time=0.047, grad_norm=185.098, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.401e-05, train_time=0.264 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:34:58,248 (trainer:732) INFO: 73epoch:train:5371-5728batch: iter_time=0.010, forward_time=0.104, loss_ctc=33.416, loss_att=16.547, acc=0.841, loss=21.608, backward_time=0.047, grad_norm=188.955, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.400e-05, train_time=0.264 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:36:32,467 (trainer:732) INFO: 73epoch:train:5729-6086batch: iter_time=0.010, forward_time=0.104, loss_ctc=32.617, loss_att=16.118, acc=0.839, loss=21.067, backward_time=0.047, grad_norm=181.839, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.399e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:38:05,692 (trainer:732) INFO: 73epoch:train:6087-6444batch: iter_time=0.005, forward_time=0.105, loss_ctc=34.518, loss_att=16.949, acc=0.841, loss=22.220, backward_time=0.047, grad_norm=189.033, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.398e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:39:39,022 (trainer:732) INFO: 73epoch:train:6445-6802batch: iter_time=0.008, forward_time=0.104, loss_ctc=33.116, loss_att=16.355, acc=0.841, loss=21.383, backward_time=0.047, grad_norm=190.834, clip=100.000, loss_scale=322.503, optim_step_time=0.033, optim0_lr0=2.397e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:41:13,003 (trainer:732) INFO: 73epoch:train:6803-7160batch: iter_time=0.006, forward_time=0.106, loss_ctc=34.635, loss_att=17.215, acc=0.838, loss=22.441, backward_time=0.047, grad_norm=191.783, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.397e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:42:22,667 (trainer:338) INFO: 73epoch results: [train] iter_time=0.009, forward_time=0.103, loss_ctc=33.511, loss_att=16.572, acc=0.841, loss=21.653, backward_time=0.047, grad_norm=187.056, clip=100.000, loss_scale=386.825, optim_step_time=0.033, optim0_lr0=2.404e-05, train_time=0.261, time=31 minutes and 14.45 seconds, total_count=522753, gpu_max_cached_mem_GB=23.475, [valid] loss_ctc=15.503, cer_ctc=0.081, loss_att=8.144, acc=0.921, cer=0.050, wer=0.686, loss=10.352, time=14.97 seconds, total_count=3869, gpu_max_cached_mem_GB=26.869, [att_plot] time=54.05 seconds, total_count=0, gpu_max_cached_mem_GB=26.869 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:42:26,574 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:42:26,605 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/63epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:42:26,605 (trainer:272) INFO: 74/100epoch started. Estimated time to finish: 14 hours, 36 minutes and 21.29 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:43:56,077 (trainer:732) INFO: 74epoch:train:1-358batch: iter_time=0.003, forward_time=0.097, loss_ctc=33.180, loss_att=16.404, acc=0.845, loss=21.437, backward_time=0.051, grad_norm=182.135, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.396e-05, train_time=0.250 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:45:24,839 (trainer:732) INFO: 74epoch:train:359-716batch: iter_time=4.061e-04, forward_time=0.097, loss_ctc=33.504, loss_att=16.497, acc=0.844, loss=21.599, backward_time=0.051, grad_norm=182.461, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.395e-05, train_time=0.248 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:46:54,108 (trainer:732) INFO: 74epoch:train:717-1074batch: iter_time=5.781e-04, forward_time=0.097, loss_ctc=33.459, loss_att=16.539, acc=0.844, loss=21.615, backward_time=0.051, grad_norm=186.022, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.394e-05, train_time=0.249 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:48:24,008 (trainer:732) INFO: 74epoch:train:1075-1432batch: iter_time=5.529e-04, forward_time=0.098, loss_ctc=33.758, loss_att=16.741, acc=0.841, loss=21.846, backward_time=0.051, grad_norm=187.804, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.393e-05, train_time=0.251 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:49:13,058 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:49:54,369 (trainer:732) INFO: 74epoch:train:1433-1790batch: iter_time=0.001, forward_time=0.098, loss_ctc=34.785, loss_att=17.246, acc=0.839, loss=22.508, backward_time=0.051, grad_norm=188.429, clip=100.000, loss_scale=623.866, optim_step_time=0.033, optim0_lr0=2.392e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:51:06,824 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:51:24,301 (trainer:732) INFO: 74epoch:train:1791-2148batch: iter_time=0.002, forward_time=0.098, loss_ctc=33.051, loss_att=16.333, acc=0.840, loss=21.348, backward_time=0.052, grad_norm=179.408, clip=100.000, loss_scale=461.804, optim_step_time=0.032, optim0_lr0=2.392e-05, train_time=0.251 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:52:54,545 (trainer:732) INFO: 74epoch:train:2149-2506batch: iter_time=0.002, forward_time=0.098, loss_ctc=34.425, loss_att=16.990, acc=0.843, loss=22.221, backward_time=0.052, grad_norm=187.258, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.391e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:54:25,546 (trainer:732) INFO: 74epoch:train:2507-2864batch: iter_time=0.005, forward_time=0.098, loss_ctc=31.504, loss_att=15.492, acc=0.845, loss=20.296, backward_time=0.052, grad_norm=185.182, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.390e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:55:56,952 (trainer:732) INFO: 74epoch:train:2865-3222batch: iter_time=0.005, forward_time=0.098, loss_ctc=34.977, loss_att=17.265, acc=0.841, loss=22.579, backward_time=0.051, grad_norm=189.395, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.389e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:57:27,617 (trainer:732) INFO: 74epoch:train:3223-3580batch: iter_time=0.003, forward_time=0.098, loss_ctc=33.231, loss_att=16.423, acc=0.841, loss=21.466, backward_time=0.052, grad_norm=181.479, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.388e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:58:24,749 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:58:34,386 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-14 23:58:56,976 (trainer:732) INFO: 74epoch:train:3581-3938batch: iter_time=0.004, forward_time=0.096, loss_ctc=33.991, loss_att=16.752, acc=0.840, loss=21.924, backward_time=0.051, grad_norm=193.539, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.388e-05, train_time=0.249 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:00:30,573 (trainer:732) INFO: 74epoch:train:3939-4296batch: iter_time=0.014, forward_time=0.097, loss_ctc=30.655, loss_att=15.146, acc=0.845, loss=19.798, backward_time=0.052, grad_norm=183.868, clip=100.000, loss_scale=411.888, optim_step_time=0.032, optim0_lr0=2.387e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:02:02,705 (trainer:732) INFO: 74epoch:train:4297-4654batch: iter_time=0.010, forward_time=0.097, loss_ctc=31.761, loss_att=15.681, acc=0.845, loss=20.505, backward_time=0.052, grad_norm=181.537, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.386e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:03:34,226 (trainer:732) INFO: 74epoch:train:4655-5012batch: iter_time=0.005, forward_time=0.098, loss_ctc=33.980, loss_att=16.893, acc=0.838, loss=22.019, backward_time=0.052, grad_norm=193.564, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.385e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:05:06,681 (trainer:732) INFO: 74epoch:train:5013-5370batch: iter_time=0.009, forward_time=0.098, loss_ctc=32.265, loss_att=15.895, acc=0.845, loss=20.806, backward_time=0.052, grad_norm=180.984, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.384e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:06:37,003 (trainer:732) INFO: 74epoch:train:5371-5728batch: iter_time=0.004, forward_time=0.096, loss_ctc=35.505, loss_att=17.574, acc=0.838, loss=22.953, backward_time=0.051, grad_norm=187.740, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.383e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:06:58,231 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:08:10,368 (trainer:732) INFO: 74epoch:train:5729-6086batch: iter_time=0.008, forward_time=0.099, loss_ctc=33.652, loss_att=16.567, acc=0.842, loss=21.693, backward_time=0.052, grad_norm=192.960, clip=100.000, loss_scale=523.441, optim_step_time=0.033, optim0_lr0=2.383e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:08:42,727 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:09:41,529 (trainer:732) INFO: 74epoch:train:6087-6444batch: iter_time=0.004, forward_time=0.098, loss_ctc=33.835, loss_att=16.858, acc=0.840, loss=21.951, backward_time=0.052, grad_norm=186.781, clip=100.000, loss_scale=691.272, optim_step_time=0.033, optim0_lr0=2.382e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:11:13,951 (trainer:732) INFO: 74epoch:train:6445-6802batch: iter_time=0.009, forward_time=0.097, loss_ctc=33.285, loss_att=16.453, acc=0.840, loss=21.503, backward_time=0.051, grad_norm=183.725, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.381e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:12:47,084 (trainer:732) INFO: 74epoch:train:6803-7160batch: iter_time=0.010, forward_time=0.098, loss_ctc=32.958, loss_att=16.216, acc=0.845, loss=21.239, backward_time=0.052, grad_norm=189.861, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.380e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:13:55,998 (trainer:338) INFO: 74epoch results: [train] iter_time=0.005, forward_time=0.097, loss_ctc=33.355, loss_att=16.481, acc=0.842, loss=21.543, backward_time=0.051, grad_norm=186.206, clip=100.000, loss_scale=455.564, optim_step_time=0.033, optim0_lr0=2.388e-05, train_time=0.254, time=30 minutes and 21.14 seconds, total_count=529914, gpu_max_cached_mem_GB=26.869, [valid] loss_ctc=15.442, cer_ctc=0.081, loss_att=8.097, acc=0.921, cer=0.050, wer=0.682, loss=10.301, time=14.8 seconds, total_count=3922, gpu_max_cached_mem_GB=26.869, [att_plot] time=53.45 seconds, total_count=0, gpu_max_cached_mem_GB=26.869 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:13:59,521 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:13:59,537 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/64epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:13:59,538 (trainer:272) INFO: 75/100epoch started. Estimated time to finish: 13 hours, 52 minutes and 5.04 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:15:29,143 (trainer:732) INFO: 75epoch:train:1-358batch: iter_time=0.003, forward_time=0.097, loss_ctc=32.905, loss_att=16.217, acc=0.846, loss=21.223, backward_time=0.051, grad_norm=185.420, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.379e-05, train_time=0.250 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:16:58,906 (trainer:732) INFO: 75epoch:train:359-716batch: iter_time=0.002, forward_time=0.098, loss_ctc=34.216, loss_att=16.956, acc=0.839, loss=22.134, backward_time=0.051, grad_norm=185.612, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.379e-05, train_time=0.250 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:18:27,943 (trainer:732) INFO: 75epoch:train:717-1074batch: iter_time=0.002, forward_time=0.096, loss_ctc=32.507, loss_att=16.014, acc=0.845, loss=20.962, backward_time=0.051, grad_norm=185.079, clip=100.000, loss_scale=544.894, optim_step_time=0.033, optim0_lr0=2.378e-05, train_time=0.248 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:18:36,216 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:19:57,294 (trainer:732) INFO: 75epoch:train:1075-1432batch: iter_time=0.002, forward_time=0.096, loss_ctc=33.039, loss_att=16.295, acc=0.842, loss=21.318, backward_time=0.052, grad_norm=185.706, clip=100.000, loss_scale=557.894, optim_step_time=0.033, optim0_lr0=2.377e-05, train_time=0.249 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:20:45,078 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:21:27,761 (trainer:732) INFO: 75epoch:train:1433-1790batch: iter_time=0.003, forward_time=0.097, loss_ctc=33.713, loss_att=16.678, acc=0.841, loss=21.788, backward_time=0.051, grad_norm=195.589, clip=100.000, loss_scale=391.529, optim_step_time=0.033, optim0_lr0=2.376e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:22:58,868 (trainer:732) INFO: 75epoch:train:1791-2148batch: iter_time=0.005, forward_time=0.098, loss_ctc=33.426, loss_att=16.471, acc=0.844, loss=21.558, backward_time=0.051, grad_norm=191.421, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.375e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:24:29,355 (trainer:732) INFO: 75epoch:train:2149-2506batch: iter_time=0.006, forward_time=0.097, loss_ctc=31.988, loss_att=15.729, acc=0.845, loss=20.607, backward_time=0.052, grad_norm=177.910, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.375e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:26:02,106 (trainer:732) INFO: 75epoch:train:2507-2864batch: iter_time=0.008, forward_time=0.099, loss_ctc=32.715, loss_att=16.141, acc=0.843, loss=21.113, backward_time=0.051, grad_norm=185.561, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.374e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:27:32,863 (trainer:732) INFO: 75epoch:train:2865-3222batch: iter_time=0.007, forward_time=0.097, loss_ctc=32.352, loss_att=15.982, acc=0.844, loss=20.893, backward_time=0.052, grad_norm=185.326, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.373e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:29:05,182 (trainer:732) INFO: 75epoch:train:3223-3580batch: iter_time=0.009, forward_time=0.098, loss_ctc=32.498, loss_att=16.048, acc=0.844, loss=20.983, backward_time=0.051, grad_norm=186.353, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.372e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:30:07,210 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:30:35,833 (trainer:732) INFO: 75epoch:train:3581-3938batch: iter_time=0.008, forward_time=0.096, loss_ctc=31.962, loss_att=15.725, acc=0.844, loss=20.596, backward_time=0.051, grad_norm=183.961, clip=100.000, loss_scale=481.966, optim_step_time=0.032, optim0_lr0=2.371e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:32:07,589 (trainer:732) INFO: 75epoch:train:3939-4296batch: iter_time=0.006, forward_time=0.098, loss_ctc=33.469, loss_att=16.565, acc=0.844, loss=21.636, backward_time=0.052, grad_norm=183.213, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.371e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:33:43,831 (trainer:732) INFO: 75epoch:train:4297-4654batch: iter_time=0.020, forward_time=0.097, loss_ctc=33.102, loss_att=16.332, acc=0.842, loss=21.363, backward_time=0.052, grad_norm=189.520, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.370e-05, train_time=0.269 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:35:16,411 (trainer:732) INFO: 75epoch:train:4655-5012batch: iter_time=0.007, forward_time=0.099, loss_ctc=34.744, loss_att=17.190, acc=0.842, loss=22.456, backward_time=0.051, grad_norm=187.327, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.369e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:36:49,324 (trainer:732) INFO: 75epoch:train:5013-5370batch: iter_time=0.008, forward_time=0.099, loss_ctc=34.340, loss_att=16.943, acc=0.843, loss=22.162, backward_time=0.051, grad_norm=192.966, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.368e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:37:02,767 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:37:11,889 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:38:14,943 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:38:21,871 (trainer:732) INFO: 75epoch:train:5371-5728batch: iter_time=0.009, forward_time=0.098, loss_ctc=34.347, loss_att=16.980, acc=0.840, loss=22.190, backward_time=0.051, grad_norm=193.613, clip=100.000, loss_scale=626.734, optim_step_time=0.032, optim0_lr0=2.367e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:39:55,663 (trainer:732) INFO: 75epoch:train:5729-6086batch: iter_time=0.012, forward_time=0.098, loss_ctc=33.312, loss_att=16.514, acc=0.842, loss=21.553, backward_time=0.052, grad_norm=187.721, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.367e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:41:30,640 (trainer:732) INFO: 75epoch:train:6087-6444batch: iter_time=0.013, forward_time=0.099, loss_ctc=34.012, loss_att=16.847, acc=0.842, loss=21.996, backward_time=0.052, grad_norm=189.243, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.366e-05, train_time=0.265 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:43:03,387 (trainer:732) INFO: 75epoch:train:6445-6802batch: iter_time=0.010, forward_time=0.097, loss_ctc=33.213, loss_att=16.444, acc=0.842, loss=21.475, backward_time=0.051, grad_norm=188.746, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.365e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:44:37,141 (trainer:732) INFO: 75epoch:train:6803-7160batch: iter_time=0.013, forward_time=0.097, loss_ctc=32.643, loss_att=16.117, acc=0.842, loss=21.075, backward_time=0.051, grad_norm=181.626, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.364e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:45:45,489 (trainer:338) INFO: 75epoch results: [train] iter_time=0.008, forward_time=0.097, loss_ctc=33.219, loss_att=16.406, acc=0.843, loss=21.450, backward_time=0.051, grad_norm=187.089, clip=100.000, loss_scale=450.128, optim_step_time=0.033, optim0_lr0=2.372e-05, train_time=0.256, time=30 minutes and 38.25 seconds, total_count=537075, gpu_max_cached_mem_GB=26.869, [valid] loss_ctc=15.289, cer_ctc=0.081, loss_att=7.996, acc=0.923, cer=0.049, wer=0.672, loss=10.184, time=14.65 seconds, total_count=3975, gpu_max_cached_mem_GB=26.869, [att_plot] time=53.05 seconds, total_count=0, gpu_max_cached_mem_GB=26.869 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:45:49,180 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:45:49,197 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/67epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:45:49,197 (trainer:272) INFO: 76/100epoch started. Estimated time to finish: 13 hours, 18 minutes and 37.06 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:46:39,711 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:47:20,906 (trainer:732) INFO: 76epoch:train:1-358batch: iter_time=0.003, forward_time=0.100, loss_ctc=33.992, loss_att=16.796, acc=0.842, loss=21.955, backward_time=0.052, grad_norm=190.101, clip=100.000, loss_scale=396.549, optim_step_time=0.032, optim0_lr0=2.364e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:48:39,716 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:48:50,962 (trainer:732) INFO: 76epoch:train:359-716batch: iter_time=0.002, forward_time=0.097, loss_ctc=34.779, loss_att=17.127, acc=0.841, loss=22.422, backward_time=0.051, grad_norm=198.020, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.363e-05, train_time=0.251 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:50:22,107 (trainer:732) INFO: 76epoch:train:717-1074batch: iter_time=0.003, forward_time=0.099, loss_ctc=33.871, loss_att=16.767, acc=0.843, loss=21.898, backward_time=0.052, grad_norm=194.147, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.362e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:51:51,918 (trainer:732) INFO: 76epoch:train:1075-1432batch: iter_time=0.002, forward_time=0.097, loss_ctc=33.506, loss_att=16.490, acc=0.842, loss=21.595, backward_time=0.051, grad_norm=187.609, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.361e-05, train_time=0.251 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:52:08,647 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:53:23,174 (trainer:732) INFO: 76epoch:train:1433-1790batch: iter_time=0.005, forward_time=0.098, loss_ctc=32.048, loss_att=15.818, acc=0.848, loss=20.687, backward_time=0.051, grad_norm=182.299, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.360e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:54:57,141 (trainer:732) INFO: 76epoch:train:1791-2148batch: iter_time=0.014, forward_time=0.097, loss_ctc=31.566, loss_att=15.484, acc=0.847, loss=20.308, backward_time=0.051, grad_norm=178.734, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.360e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:56:27,657 (trainer:732) INFO: 76epoch:train:2149-2506batch: iter_time=0.006, forward_time=0.097, loss_ctc=31.596, loss_att=15.586, acc=0.847, loss=20.389, backward_time=0.052, grad_norm=184.103, clip=100.000, loss_scale=476.961, optim_step_time=0.033, optim0_lr0=2.359e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:57:58,266 (trainer:732) INFO: 76epoch:train:2507-2864batch: iter_time=0.002, forward_time=0.098, loss_ctc=36.292, loss_att=17.992, acc=0.840, loss=23.482, backward_time=0.051, grad_norm=194.992, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.358e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 00:59:31,516 (trainer:732) INFO: 76epoch:train:2865-3222batch: iter_time=0.005, forward_time=0.101, loss_ctc=33.011, loss_att=16.206, acc=0.842, loss=21.247, backward_time=0.052, grad_norm=190.068, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.357e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:01:04,511 (trainer:732) INFO: 76epoch:train:3223-3580batch: iter_time=0.004, forward_time=0.103, loss_ctc=34.297, loss_att=16.949, acc=0.843, loss=22.154, backward_time=0.051, grad_norm=186.432, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.356e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:02:35,753 (trainer:732) INFO: 76epoch:train:3581-3938batch: iter_time=0.004, forward_time=0.101, loss_ctc=31.958, loss_att=15.751, acc=0.843, loss=20.613, backward_time=0.051, grad_norm=185.304, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.356e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:03:45,518 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:04:08,756 (trainer:732) INFO: 76epoch:train:3939-4296batch: iter_time=0.006, forward_time=0.103, loss_ctc=31.956, loss_att=15.779, acc=0.845, loss=20.632, backward_time=0.051, grad_norm=184.780, clip=100.000, loss_scale=522.039, optim_step_time=0.033, optim0_lr0=2.355e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:05:40,198 (trainer:732) INFO: 76epoch:train:4297-4654batch: iter_time=0.003, forward_time=0.102, loss_ctc=33.222, loss_att=16.424, acc=0.844, loss=21.463, backward_time=0.052, grad_norm=186.969, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.354e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:07:13,958 (trainer:732) INFO: 76epoch:train:4655-5012batch: iter_time=0.008, forward_time=0.103, loss_ctc=32.685, loss_att=16.175, acc=0.844, loss=21.128, backward_time=0.051, grad_norm=184.382, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.353e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:08:46,732 (trainer:732) INFO: 76epoch:train:5013-5370batch: iter_time=0.004, forward_time=0.103, loss_ctc=32.503, loss_att=15.942, acc=0.846, loss=20.910, backward_time=0.051, grad_norm=188.058, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.353e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:10:20,656 (trainer:732) INFO: 76epoch:train:5371-5728batch: iter_time=0.007, forward_time=0.104, loss_ctc=32.748, loss_att=16.156, acc=0.843, loss=21.134, backward_time=0.052, grad_norm=184.262, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.352e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:11:54,629 (trainer:732) INFO: 76epoch:train:5729-6086batch: iter_time=0.007, forward_time=0.104, loss_ctc=32.845, loss_att=16.196, acc=0.844, loss=21.191, backward_time=0.051, grad_norm=186.803, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.351e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:12:17,432 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:12:43,721 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:13:28,066 (trainer:732) INFO: 76epoch:train:6087-6444batch: iter_time=0.009, forward_time=0.102, loss_ctc=32.264, loss_att=15.920, acc=0.845, loss=20.823, backward_time=0.051, grad_norm=189.195, clip=100.000, loss_scale=612.392, optim_step_time=0.032, optim0_lr0=2.350e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:15:02,597 (trainer:732) INFO: 76epoch:train:6445-6802batch: iter_time=0.007, forward_time=0.104, loss_ctc=33.796, loss_att=16.777, acc=0.841, loss=21.883, backward_time=0.052, grad_norm=191.803, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.349e-05, train_time=0.264 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:16:26,060 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:16:35,547 (trainer:732) INFO: 76epoch:train:6803-7160batch: iter_time=0.007, forward_time=0.102, loss_ctc=32.813, loss_att=16.154, acc=0.844, loss=21.152, backward_time=0.052, grad_norm=189.150, clip=100.000, loss_scale=485.468, optim_step_time=0.032, optim0_lr0=2.349e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:17:44,457 (trainer:338) INFO: 76epoch results: [train] iter_time=0.005, forward_time=0.101, loss_ctc=33.063, loss_att=16.312, acc=0.844, loss=21.337, backward_time=0.051, grad_norm=187.857, clip=100.000, loss_scale=444.611, optim_step_time=0.033, optim0_lr0=2.356e-05, train_time=0.258, time=30 minutes and 47.09 seconds, total_count=544236, gpu_max_cached_mem_GB=26.869, [valid] loss_ctc=15.248, cer_ctc=0.079, loss_att=7.991, acc=0.923, cer=0.048, wer=0.675, loss=10.169, time=14.97 seconds, total_count=4028, gpu_max_cached_mem_GB=26.869, [att_plot] time=53.2 seconds, total_count=0, gpu_max_cached_mem_GB=26.869 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:17:48,048 (trainer:384) INFO: There are no improvements in this epoch [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:17:48,065 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/66epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:17:48,065 (trainer:272) INFO: 77/100epoch started. Estimated time to finish: 12 hours, 46 minutes and 53.49 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:19:20,504 (trainer:732) INFO: 77epoch:train:1-358batch: iter_time=0.005, forward_time=0.102, loss_ctc=32.712, loss_att=16.145, acc=0.845, loss=21.115, backward_time=0.051, grad_norm=192.256, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.348e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:20:50,164 (trainer:732) INFO: 77epoch:train:359-716batch: iter_time=0.001, forward_time=0.100, loss_ctc=32.713, loss_att=16.114, acc=0.844, loss=21.094, backward_time=0.051, grad_norm=191.680, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.347e-05, train_time=0.250 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:22:22,129 (trainer:732) INFO: 77epoch:train:717-1074batch: iter_time=0.001, forward_time=0.103, loss_ctc=33.758, loss_att=16.622, acc=0.844, loss=21.763, backward_time=0.052, grad_norm=189.224, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.346e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:23:54,243 (trainer:732) INFO: 77epoch:train:1075-1432batch: iter_time=6.043e-04, forward_time=0.104, loss_ctc=34.380, loss_att=16.986, acc=0.841, loss=22.204, backward_time=0.052, grad_norm=189.070, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.346e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:25:25,961 (trainer:732) INFO: 77epoch:train:1433-1790batch: iter_time=0.005, forward_time=0.102, loss_ctc=32.102, loss_att=15.807, acc=0.845, loss=20.695, backward_time=0.052, grad_norm=182.815, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.345e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:26:57,623 (trainer:732) INFO: 77epoch:train:1791-2148batch: iter_time=0.003, forward_time=0.102, loss_ctc=31.779, loss_att=15.614, acc=0.847, loss=20.464, backward_time=0.051, grad_norm=183.559, clip=100.000, loss_scale=389.006, optim_step_time=0.033, optim0_lr0=2.344e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:28:30,392 (trainer:732) INFO: 77epoch:train:2149-2506batch: iter_time=0.002, forward_time=0.104, loss_ctc=33.342, loss_att=16.410, acc=0.845, loss=21.490, backward_time=0.052, grad_norm=186.646, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.343e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:30:02,942 (trainer:732) INFO: 77epoch:train:2507-2864batch: iter_time=0.003, forward_time=0.104, loss_ctc=32.223, loss_att=15.884, acc=0.845, loss=20.786, backward_time=0.051, grad_norm=186.132, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.343e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:31:37,304 (trainer:732) INFO: 77epoch:train:2865-3222batch: iter_time=0.005, forward_time=0.105, loss_ctc=33.553, loss_att=16.610, acc=0.842, loss=21.693, backward_time=0.051, grad_norm=190.543, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.342e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:31:46,384 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:32:57,578 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:33:10,241 (trainer:732) INFO: 77epoch:train:3223-3580batch: iter_time=0.006, forward_time=0.103, loss_ctc=31.630, loss_att=15.520, acc=0.848, loss=20.353, backward_time=0.052, grad_norm=184.856, clip=100.000, loss_scale=280.381, optim_step_time=0.033, optim0_lr0=2.341e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:34:43,600 (trainer:732) INFO: 77epoch:train:3581-3938batch: iter_time=0.007, forward_time=0.103, loss_ctc=31.995, loss_att=15.741, acc=0.847, loss=20.617, backward_time=0.052, grad_norm=184.561, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.340e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:34:53,689 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:36:16,571 (trainer:732) INFO: 77epoch:train:3939-4296batch: iter_time=0.002, forward_time=0.104, loss_ctc=34.659, loss_att=17.174, acc=0.842, loss=22.420, backward_time=0.052, grad_norm=199.042, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.340e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:37:49,636 (trainer:732) INFO: 77epoch:train:4297-4654batch: iter_time=0.004, forward_time=0.103, loss_ctc=33.299, loss_att=16.404, acc=0.843, loss=21.473, backward_time=0.052, grad_norm=188.920, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.339e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:39:23,435 (trainer:732) INFO: 77epoch:train:4655-5012batch: iter_time=0.006, forward_time=0.104, loss_ctc=32.958, loss_att=16.255, acc=0.843, loss=21.266, backward_time=0.052, grad_norm=188.597, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.338e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:40:56,227 (trainer:732) INFO: 77epoch:train:5013-5370batch: iter_time=0.005, forward_time=0.103, loss_ctc=32.525, loss_att=16.065, acc=0.846, loss=21.003, backward_time=0.051, grad_norm=184.897, clip=100.000, loss_scale=336.804, optim_step_time=0.033, optim0_lr0=2.337e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:42:30,098 (trainer:732) INFO: 77epoch:train:5371-5728batch: iter_time=0.005, forward_time=0.105, loss_ctc=33.105, loss_att=16.379, acc=0.844, loss=21.397, backward_time=0.052, grad_norm=190.295, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.336e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:44:03,276 (trainer:732) INFO: 77epoch:train:5729-6086batch: iter_time=0.005, forward_time=0.104, loss_ctc=32.520, loss_att=16.089, acc=0.846, loss=21.018, backward_time=0.052, grad_norm=184.822, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.336e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:44:19,769 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:44:20,194 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:45:38,258 (trainer:732) INFO: 77epoch:train:6087-6444batch: iter_time=0.010, forward_time=0.104, loss_ctc=32.154, loss_att=15.898, acc=0.846, loss=20.775, backward_time=0.052, grad_norm=187.093, clip=100.000, loss_scale=301.176, optim_step_time=0.032, optim0_lr0=2.335e-05, train_time=0.265 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:47:12,455 (trainer:732) INFO: 77epoch:train:6445-6802batch: iter_time=0.005, forward_time=0.105, loss_ctc=33.270, loss_att=16.411, acc=0.847, loss=21.469, backward_time=0.052, grad_norm=188.635, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.334e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:48:47,473 (trainer:732) INFO: 77epoch:train:6803-7160batch: iter_time=0.009, forward_time=0.104, loss_ctc=33.250, loss_att=16.421, acc=0.843, loss=21.470, backward_time=0.051, grad_norm=187.039, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.333e-05, train_time=0.265 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:49:56,340 (trainer:338) INFO: 77epoch results: [train] iter_time=0.005, forward_time=0.103, loss_ctc=32.880, loss_att=16.219, acc=0.845, loss=21.218, backward_time=0.052, grad_norm=188.028, clip=100.000, loss_scale=334.170, optim_step_time=0.033, optim0_lr0=2.341e-05, train_time=0.259, time=31 minutes and 0.07 seconds, total_count=551397, gpu_max_cached_mem_GB=26.869, [valid] loss_ctc=15.062, cer_ctc=0.078, loss_att=7.929, acc=0.923, cer=0.049, wer=0.673, loss=10.069, time=15.01 seconds, total_count=4081, gpu_max_cached_mem_GB=26.869, [att_plot] time=53.19 seconds, total_count=0, gpu_max_cached_mem_GB=26.869 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:49:59,872 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:49:59,887 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/65epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:49:59,888 (trainer:272) INFO: 78/100epoch started. Estimated time to finish: 12 hours, 16 minutes and 3.39 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:51:31,832 (trainer:732) INFO: 78epoch:train:1-358batch: iter_time=0.005, forward_time=0.099, loss_ctc=33.199, loss_att=16.243, acc=0.846, loss=21.330, backward_time=0.052, grad_norm=188.080, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.333e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:51:38,540 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:53:00,999 (trainer:732) INFO: 78epoch:train:359-716batch: iter_time=5.151e-04, forward_time=0.097, loss_ctc=35.045, loss_att=17.274, acc=0.841, loss=22.606, backward_time=0.051, grad_norm=192.421, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.332e-05, train_time=0.249 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:54:31,524 (trainer:732) INFO: 78epoch:train:717-1074batch: iter_time=0.003, forward_time=0.098, loss_ctc=32.262, loss_att=15.880, acc=0.846, loss=20.795, backward_time=0.051, grad_norm=181.339, clip=100.000, loss_scale=316.782, optim_step_time=0.033, optim0_lr0=2.331e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:56:02,775 (trainer:732) INFO: 78epoch:train:1075-1432batch: iter_time=0.003, forward_time=0.099, loss_ctc=33.244, loss_att=16.373, acc=0.847, loss=21.435, backward_time=0.051, grad_norm=192.212, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.330e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:56:53,248 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:57:33,197 (trainer:732) INFO: 78epoch:train:1433-1790batch: iter_time=0.004, forward_time=0.098, loss_ctc=31.854, loss_att=15.645, acc=0.848, loss=20.508, backward_time=0.051, grad_norm=184.958, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.330e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:57:35,976 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 01:59:04,169 (trainer:732) INFO: 78epoch:train:1791-2148batch: iter_time=0.002, forward_time=0.099, loss_ctc=33.722, loss_att=16.662, acc=0.844, loss=21.780, backward_time=0.052, grad_norm=186.830, clip=100.000, loss_scale=263.171, optim_step_time=0.032, optim0_lr0=2.329e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:00:36,862 (trainer:732) INFO: 78epoch:train:2149-2506batch: iter_time=0.008, forward_time=0.099, loss_ctc=32.690, loss_att=16.130, acc=0.845, loss=21.098, backward_time=0.051, grad_norm=186.381, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.328e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:02:07,284 (trainer:732) INFO: 78epoch:train:2507-2864batch: iter_time=0.006, forward_time=0.097, loss_ctc=31.921, loss_att=15.700, acc=0.846, loss=20.566, backward_time=0.052, grad_norm=188.084, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.327e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:03:39,109 (trainer:732) INFO: 78epoch:train:2865-3222batch: iter_time=0.006, forward_time=0.098, loss_ctc=33.508, loss_att=16.468, acc=0.844, loss=21.580, backward_time=0.051, grad_norm=189.294, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.327e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:05:11,962 (trainer:732) INFO: 78epoch:train:3223-3580batch: iter_time=0.007, forward_time=0.099, loss_ctc=33.937, loss_att=16.727, acc=0.843, loss=21.890, backward_time=0.051, grad_norm=187.593, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.326e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:06:43,412 (trainer:732) INFO: 78epoch:train:3581-3938batch: iter_time=0.007, forward_time=0.098, loss_ctc=31.707, loss_att=15.565, acc=0.849, loss=20.407, backward_time=0.051, grad_norm=184.369, clip=100.000, loss_scale=353.966, optim_step_time=0.033, optim0_lr0=2.325e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:08:15,456 (trainer:732) INFO: 78epoch:train:3939-4296batch: iter_time=0.005, forward_time=0.099, loss_ctc=34.080, loss_att=16.825, acc=0.841, loss=22.001, backward_time=0.052, grad_norm=187.171, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.324e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:09:47,822 (trainer:732) INFO: 78epoch:train:4297-4654batch: iter_time=0.009, forward_time=0.098, loss_ctc=32.976, loss_att=16.280, acc=0.844, loss=21.289, backward_time=0.052, grad_norm=188.052, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.324e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:10:00,936 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:11:20,331 (trainer:732) INFO: 78epoch:train:4655-5012batch: iter_time=0.010, forward_time=0.098, loss_ctc=31.584, loss_att=15.586, acc=0.846, loss=20.385, backward_time=0.051, grad_norm=181.892, clip=100.000, loss_scale=291.137, optim_step_time=0.033, optim0_lr0=2.323e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:12:51,924 (trainer:732) INFO: 78epoch:train:5013-5370batch: iter_time=0.009, forward_time=0.097, loss_ctc=30.506, loss_att=14.992, acc=0.849, loss=19.646, backward_time=0.051, grad_norm=180.665, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.322e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:14:25,544 (trainer:732) INFO: 78epoch:train:5371-5728batch: iter_time=0.013, forward_time=0.098, loss_ctc=31.292, loss_att=15.387, acc=0.849, loss=20.159, backward_time=0.051, grad_norm=188.711, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.321e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:15:29,430 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:15:57,735 (trainer:732) INFO: 78epoch:train:5729-6086batch: iter_time=0.007, forward_time=0.098, loss_ctc=34.303, loss_att=16.946, acc=0.842, loss=22.153, backward_time=0.052, grad_norm=192.111, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.321e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:17:32,146 (trainer:732) INFO: 78epoch:train:6087-6444batch: iter_time=0.011, forward_time=0.099, loss_ctc=34.196, loss_att=16.910, acc=0.842, loss=22.096, backward_time=0.051, grad_norm=188.727, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.320e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:19:05,356 (trainer:732) INFO: 78epoch:train:6445-6802batch: iter_time=0.011, forward_time=0.098, loss_ctc=31.767, loss_att=15.657, acc=0.846, loss=20.490, backward_time=0.051, grad_norm=183.379, clip=100.000, loss_scale=326.078, optim_step_time=0.032, optim0_lr0=2.319e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:20:38,915 (trainer:732) INFO: 78epoch:train:6803-7160batch: iter_time=0.014, forward_time=0.097, loss_ctc=32.226, loss_att=15.891, acc=0.847, loss=20.792, backward_time=0.051, grad_norm=189.860, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.318e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:21:47,581 (trainer:338) INFO: 78epoch results: [train] iter_time=0.007, forward_time=0.098, loss_ctc=32.770, loss_att=16.141, acc=0.845, loss=21.130, backward_time=0.051, grad_norm=187.091, clip=100.000, loss_scale=333.597, optim_step_time=0.033, optim0_lr0=2.326e-05, train_time=0.257, time=30 minutes and 39.73 seconds, total_count=558558, gpu_max_cached_mem_GB=26.869, [valid] loss_ctc=15.021, cer_ctc=0.079, loss_att=7.918, acc=0.923, cer=0.049, wer=0.669, loss=10.049, time=14.44 seconds, total_count=4134, gpu_max_cached_mem_GB=26.869, [att_plot] time=53.52 seconds, total_count=0, gpu_max_cached_mem_GB=26.869 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:21:51,164 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:21:51,195 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/68epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:21:51,195 (trainer:272) INFO: 79/100epoch started. Estimated time to finish: 11 hours, 43 minutes and 30.83 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:23:21,405 (trainer:732) INFO: 79epoch:train:1-358batch: iter_time=0.004, forward_time=0.097, loss_ctc=32.501, loss_att=15.934, acc=0.846, loss=20.904, backward_time=0.051, grad_norm=187.500, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.318e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:24:50,924 (trainer:732) INFO: 79epoch:train:359-716batch: iter_time=0.003, forward_time=0.097, loss_ctc=31.677, loss_att=15.573, acc=0.849, loss=20.404, backward_time=0.051, grad_norm=196.361, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.317e-05, train_time=0.250 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:26:20,110 (trainer:732) INFO: 79epoch:train:717-1074batch: iter_time=0.002, forward_time=0.096, loss_ctc=32.405, loss_att=15.953, acc=0.846, loss=20.889, backward_time=0.051, grad_norm=187.611, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.316e-05, train_time=0.249 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:27:50,842 (trainer:732) INFO: 79epoch:train:1075-1432batch: iter_time=0.003, forward_time=0.098, loss_ctc=32.595, loss_att=16.092, acc=0.847, loss=21.043, backward_time=0.051, grad_norm=188.028, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.315e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:28:30,376 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:29:20,675 (trainer:732) INFO: 79epoch:train:1433-1790batch: iter_time=0.004, forward_time=0.097, loss_ctc=31.298, loss_att=15.389, acc=0.849, loss=20.161, backward_time=0.051, grad_norm=182.024, clip=100.000, loss_scale=575.104, optim_step_time=0.033, optim0_lr0=2.315e-05, train_time=0.251 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:29:36,452 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:30:52,295 (trainer:732) INFO: 79epoch:train:1791-2148batch: iter_time=0.006, forward_time=0.098, loss_ctc=32.227, loss_att=15.929, acc=0.846, loss=20.818, backward_time=0.052, grad_norm=187.687, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.314e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:32:22,409 (trainer:732) INFO: 79epoch:train:2149-2506batch: iter_time=0.005, forward_time=0.096, loss_ctc=31.794, loss_att=15.699, acc=0.849, loss=20.527, backward_time=0.052, grad_norm=178.511, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.313e-05, train_time=0.251 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:33:54,594 (trainer:732) INFO: 79epoch:train:2507-2864batch: iter_time=0.008, forward_time=0.097, loss_ctc=32.181, loss_att=15.874, acc=0.848, loss=20.766, backward_time=0.051, grad_norm=185.420, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.312e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:35:26,172 (trainer:732) INFO: 79epoch:train:2865-3222batch: iter_time=0.006, forward_time=0.098, loss_ctc=32.280, loss_att=15.888, acc=0.847, loss=20.806, backward_time=0.051, grad_norm=183.776, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.312e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:36:23,709 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:36:58,265 (trainer:732) INFO: 79epoch:train:3223-3580batch: iter_time=0.005, forward_time=0.099, loss_ctc=35.485, loss_att=17.501, acc=0.840, loss=22.896, backward_time=0.051, grad_norm=189.442, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.311e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:37:16,795 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:38:30,122 (trainer:732) INFO: 79epoch:train:3581-3938batch: iter_time=0.007, forward_time=0.097, loss_ctc=34.441, loss_att=16.924, acc=0.843, loss=22.179, backward_time=0.051, grad_norm=195.003, clip=100.000, loss_scale=600.919, optim_step_time=0.033, optim0_lr0=2.310e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:40:01,312 (trainer:732) INFO: 79epoch:train:3939-4296batch: iter_time=0.010, forward_time=0.096, loss_ctc=31.678, loss_att=15.605, acc=0.845, loss=20.427, backward_time=0.051, grad_norm=185.489, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.310e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:41:33,329 (trainer:732) INFO: 79epoch:train:4297-4654batch: iter_time=0.007, forward_time=0.098, loss_ctc=32.724, loss_att=16.029, acc=0.847, loss=21.037, backward_time=0.052, grad_norm=188.748, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.309e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:43:05,181 (trainer:732) INFO: 79epoch:train:4655-5012batch: iter_time=0.008, forward_time=0.097, loss_ctc=33.264, loss_att=16.401, acc=0.844, loss=21.460, backward_time=0.051, grad_norm=193.807, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.308e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:44:35,598 (trainer:732) INFO: 79epoch:train:5013-5370batch: iter_time=0.006, forward_time=0.096, loss_ctc=32.046, loss_att=15.804, acc=0.846, loss=20.676, backward_time=0.051, grad_norm=186.023, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.307e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:46:07,494 (trainer:732) INFO: 79epoch:train:5371-5728batch: iter_time=0.008, forward_time=0.097, loss_ctc=32.136, loss_att=15.873, acc=0.849, loss=20.752, backward_time=0.051, grad_norm=191.982, clip=100.000, loss_scale=622.123, optim_step_time=0.033, optim0_lr0=2.307e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:46:45,594 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:47:42,788 (trainer:732) INFO: 79epoch:train:5729-6086batch: iter_time=0.015, forward_time=0.099, loss_ctc=31.706, loss_att=15.638, acc=0.848, loss=20.459, backward_time=0.051, grad_norm=186.314, clip=100.000, loss_scale=718.521, optim_step_time=0.033, optim0_lr0=2.306e-05, train_time=0.266 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:49:17,559 (trainer:732) INFO: 79epoch:train:6087-6444batch: iter_time=0.013, forward_time=0.098, loss_ctc=32.494, loss_att=16.058, acc=0.846, loss=20.989, backward_time=0.052, grad_norm=185.776, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.305e-05, train_time=0.264 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:50:51,073 (trainer:732) INFO: 79epoch:train:6445-6802batch: iter_time=0.010, forward_time=0.098, loss_ctc=34.117, loss_att=16.844, acc=0.843, loss=22.026, backward_time=0.051, grad_norm=189.980, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.304e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:52:05,830 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:52:25,124 (trainer:732) INFO: 79epoch:train:6803-7160batch: iter_time=0.012, forward_time=0.098, loss_ctc=33.288, loss_att=16.419, acc=0.846, loss=21.480, backward_time=0.051, grad_norm=186.675, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.304e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:53:33,431 (trainer:338) INFO: 79epoch results: [train] iter_time=0.007, forward_time=0.097, loss_ctc=32.605, loss_att=16.066, acc=0.846, loss=21.027, backward_time=0.051, grad_norm=187.817, clip=100.000, loss_scale=535.390, optim_step_time=0.033, optim0_lr0=2.311e-05, train_time=0.256, time=30 minutes and 34.59 seconds, total_count=565719, gpu_max_cached_mem_GB=26.869, [valid] loss_ctc=14.979, cer_ctc=0.078, loss_att=7.848, acc=0.923, cer=0.048, wer=0.671, loss=9.987, time=14.52 seconds, total_count=4187, gpu_max_cached_mem_GB=26.869, [att_plot] time=53.12 seconds, total_count=0, gpu_max_cached_mem_GB=26.869 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:53:37,209 (trainer:384) INFO: There are no improvements in this epoch [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:53:37,241 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/70epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:53:37,241 (trainer:272) INFO: 80/100epoch started. Estimated time to finish: 11 hours, 10 minutes and 54.27 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:55:06,755 (trainer:732) INFO: 80epoch:train:1-358batch: iter_time=0.002, forward_time=0.097, loss_ctc=32.819, loss_att=16.151, acc=0.844, loss=21.151, backward_time=0.051, grad_norm=185.559, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.303e-05, train_time=0.250 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:56:36,225 (trainer:732) INFO: 80epoch:train:359-716batch: iter_time=0.003, forward_time=0.096, loss_ctc=32.653, loss_att=16.088, acc=0.847, loss=21.057, backward_time=0.051, grad_norm=182.482, clip=100.000, loss_scale=517.721, optim_step_time=0.032, optim0_lr0=2.302e-05, train_time=0.250 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:56:50,295 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:58:06,207 (trainer:732) INFO: 80epoch:train:717-1074batch: iter_time=0.002, forward_time=0.098, loss_ctc=33.319, loss_att=16.398, acc=0.848, loss=21.474, backward_time=0.051, grad_norm=195.638, clip=100.000, loss_scale=590.880, optim_step_time=0.032, optim0_lr0=2.301e-05, train_time=0.251 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 02:59:35,352 (trainer:732) INFO: 80epoch:train:1075-1432batch: iter_time=0.004, forward_time=0.096, loss_ctc=31.894, loss_att=15.722, acc=0.849, loss=20.573, backward_time=0.051, grad_norm=181.344, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.301e-05, train_time=0.249 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:01:05,689 (trainer:732) INFO: 80epoch:train:1433-1790batch: iter_time=0.004, forward_time=0.097, loss_ctc=32.474, loss_att=16.096, acc=0.845, loss=21.009, backward_time=0.051, grad_norm=188.949, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.300e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:02:36,424 (trainer:732) INFO: 80epoch:train:1791-2148batch: iter_time=0.005, forward_time=0.097, loss_ctc=31.086, loss_att=15.320, acc=0.848, loss=20.050, backward_time=0.052, grad_norm=179.952, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.299e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:04:07,363 (trainer:732) INFO: 80epoch:train:2149-2506batch: iter_time=0.003, forward_time=0.098, loss_ctc=33.646, loss_att=16.465, acc=0.847, loss=21.620, backward_time=0.051, grad_norm=193.427, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.299e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:05:33,854 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:05:38,321 (trainer:732) INFO: 80epoch:train:2507-2864batch: iter_time=0.003, forward_time=0.098, loss_ctc=32.784, loss_att=16.113, acc=0.845, loss=21.114, backward_time=0.051, grad_norm=190.038, clip=100.000, loss_scale=616.695, optim_step_time=0.033, optim0_lr0=2.298e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:07:11,323 (trainer:732) INFO: 80epoch:train:2865-3222batch: iter_time=0.008, forward_time=0.099, loss_ctc=32.401, loss_att=15.955, acc=0.849, loss=20.889, backward_time=0.052, grad_norm=184.548, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.297e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:08:44,545 (trainer:732) INFO: 80epoch:train:3223-3580batch: iter_time=0.011, forward_time=0.098, loss_ctc=30.782, loss_att=15.171, acc=0.849, loss=19.855, backward_time=0.051, grad_norm=180.530, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.296e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:10:17,428 (trainer:732) INFO: 80epoch:train:3581-3938batch: iter_time=0.009, forward_time=0.098, loss_ctc=32.696, loss_att=16.058, acc=0.849, loss=21.050, backward_time=0.051, grad_norm=185.850, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.296e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:11:51,280 (trainer:732) INFO: 80epoch:train:3939-4296batch: iter_time=0.011, forward_time=0.098, loss_ctc=32.406, loss_att=15.996, acc=0.847, loss=20.919, backward_time=0.051, grad_norm=181.292, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.295e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:13:00,301 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:13:22,707 (trainer:732) INFO: 80epoch:train:4297-4654batch: iter_time=0.007, forward_time=0.097, loss_ctc=34.008, loss_att=16.731, acc=0.845, loss=21.914, backward_time=0.051, grad_norm=191.259, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.294e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:14:23,193 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:14:55,267 (trainer:732) INFO: 80epoch:train:4655-5012batch: iter_time=0.010, forward_time=0.097, loss_ctc=31.955, loss_att=15.684, acc=0.850, loss=20.565, backward_time=0.052, grad_norm=189.368, clip=100.000, loss_scale=570.801, optim_step_time=0.032, optim0_lr0=2.294e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:16:28,047 (trainer:732) INFO: 80epoch:train:5013-5370batch: iter_time=0.010, forward_time=0.098, loss_ctc=32.311, loss_att=15.894, acc=0.846, loss=20.819, backward_time=0.051, grad_norm=189.826, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.293e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:18:00,623 (trainer:732) INFO: 80epoch:train:5371-5728batch: iter_time=0.010, forward_time=0.097, loss_ctc=32.875, loss_att=16.228, acc=0.845, loss=21.222, backward_time=0.051, grad_norm=197.387, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.292e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:19:33,394 (trainer:732) INFO: 80epoch:train:5729-6086batch: iter_time=0.010, forward_time=0.097, loss_ctc=33.505, loss_att=16.392, acc=0.844, loss=21.526, backward_time=0.051, grad_norm=191.834, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.291e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:20:36,886 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:21:05,320 (trainer:732) INFO: 80epoch:train:6087-6444batch: iter_time=0.010, forward_time=0.096, loss_ctc=31.664, loss_att=15.586, acc=0.847, loss=20.409, backward_time=0.052, grad_norm=188.105, clip=100.000, loss_scale=430.969, optim_step_time=0.033, optim0_lr0=2.291e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:22:38,777 (trainer:732) INFO: 80epoch:train:6445-6802batch: iter_time=0.010, forward_time=0.098, loss_ctc=31.762, loss_att=15.591, acc=0.848, loss=20.442, backward_time=0.052, grad_norm=184.514, clip=100.000, loss_scale=256.000, optim_step_time=0.034, optim0_lr0=2.290e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:23:09,496 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:23:55,203 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:24:11,774 (trainer:732) INFO: 80epoch:train:6803-7160batch: iter_time=0.012, forward_time=0.097, loss_ctc=32.266, loss_att=15.946, acc=0.847, loss=20.842, backward_time=0.051, grad_norm=189.941, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.289e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:25:20,373 (trainer:338) INFO: 80epoch results: [train] iter_time=0.007, forward_time=0.097, loss_ctc=32.452, loss_att=15.973, acc=0.847, loss=20.917, backward_time=0.051, grad_norm=187.595, clip=100.000, loss_scale=494.688, optim_step_time=0.033, optim0_lr0=2.296e-05, train_time=0.256, time=30 minutes and 35.23 seconds, total_count=572880, gpu_max_cached_mem_GB=26.869, [valid] loss_ctc=14.951, cer_ctc=0.077, loss_att=7.823, acc=0.924, cer=0.048, wer=0.664, loss=9.962, time=14.53 seconds, total_count=4240, gpu_max_cached_mem_GB=26.869, [att_plot] time=53.38 seconds, total_count=0, gpu_max_cached_mem_GB=26.869 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:25:24,104 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:25:24,133 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/69epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:25:24,133 (trainer:272) INFO: 81/100epoch started. Estimated time to finish: 10 hours, 38 minutes and 32.46 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:26:56,117 (trainer:732) INFO: 81epoch:train:1-358batch: iter_time=0.002, forward_time=0.103, loss_ctc=32.887, loss_att=16.195, acc=0.848, loss=21.202, backward_time=0.051, grad_norm=189.426, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.289e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:28:28,123 (trainer:732) INFO: 81epoch:train:359-716batch: iter_time=0.001, forward_time=0.104, loss_ctc=31.207, loss_att=15.294, acc=0.851, loss=20.068, backward_time=0.051, grad_norm=179.493, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.288e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:29:59,257 (trainer:732) INFO: 81epoch:train:717-1074batch: iter_time=0.001, forward_time=0.103, loss_ctc=32.014, loss_att=15.712, acc=0.848, loss=20.603, backward_time=0.052, grad_norm=186.317, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.287e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:31:02,864 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:31:30,149 (trainer:732) INFO: 81epoch:train:1075-1432batch: iter_time=6.200e-04, forward_time=0.103, loss_ctc=32.171, loss_att=15.852, acc=0.847, loss=20.748, backward_time=0.052, grad_norm=181.894, clip=100.000, loss_scale=367.866, optim_step_time=0.032, optim0_lr0=2.286e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:33:02,410 (trainer:732) INFO: 81epoch:train:1433-1790batch: iter_time=0.004, forward_time=0.103, loss_ctc=31.182, loss_att=15.333, acc=0.851, loss=20.087, backward_time=0.051, grad_norm=189.698, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.286e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:34:35,916 (trainer:732) INFO: 81epoch:train:1791-2148batch: iter_time=0.005, forward_time=0.104, loss_ctc=34.009, loss_att=16.726, acc=0.843, loss=21.911, backward_time=0.051, grad_norm=189.723, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.285e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:34:36,620 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:36:10,012 (trainer:732) INFO: 81epoch:train:2149-2506batch: iter_time=0.003, forward_time=0.106, loss_ctc=32.013, loss_att=15.775, acc=0.850, loss=20.646, backward_time=0.051, grad_norm=196.646, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.284e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:37:43,372 (trainer:732) INFO: 81epoch:train:2507-2864batch: iter_time=0.005, forward_time=0.104, loss_ctc=31.659, loss_att=15.542, acc=0.850, loss=20.377, backward_time=0.051, grad_norm=186.866, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.284e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:39:15,312 (trainer:732) INFO: 81epoch:train:2865-3222batch: iter_time=0.002, forward_time=0.103, loss_ctc=33.303, loss_att=16.368, acc=0.848, loss=21.449, backward_time=0.051, grad_norm=186.659, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.283e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:40:29,449 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:40:48,327 (trainer:732) INFO: 81epoch:train:3223-3580batch: iter_time=0.006, forward_time=0.102, loss_ctc=32.118, loss_att=15.810, acc=0.848, loss=20.703, backward_time=0.052, grad_norm=194.749, clip=100.000, loss_scale=436.916, optim_step_time=0.033, optim0_lr0=2.282e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:42:20,399 (trainer:732) INFO: 81epoch:train:3581-3938batch: iter_time=0.001, forward_time=0.103, loss_ctc=33.094, loss_att=16.267, acc=0.845, loss=21.315, backward_time=0.052, grad_norm=192.677, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.281e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:43:52,910 (trainer:732) INFO: 81epoch:train:3939-4296batch: iter_time=0.005, forward_time=0.102, loss_ctc=31.410, loss_att=15.434, acc=0.848, loss=20.227, backward_time=0.051, grad_norm=184.684, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.281e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:45:28,044 (trainer:732) INFO: 81epoch:train:4297-4654batch: iter_time=0.011, forward_time=0.104, loss_ctc=31.911, loss_att=15.732, acc=0.846, loss=20.586, backward_time=0.052, grad_norm=191.190, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.280e-05, train_time=0.265 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:47:01,937 (trainer:732) INFO: 81epoch:train:4655-5012batch: iter_time=0.009, forward_time=0.103, loss_ctc=30.514, loss_att=15.078, acc=0.852, loss=19.709, backward_time=0.052, grad_norm=184.740, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.279e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:48:35,285 (trainer:732) INFO: 81epoch:train:5013-5370batch: iter_time=0.005, forward_time=0.104, loss_ctc=32.742, loss_att=16.109, acc=0.847, loss=21.099, backward_time=0.051, grad_norm=186.278, clip=100.000, loss_scale=573.497, optim_step_time=0.033, optim0_lr0=2.279e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:48:50,680 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:50:08,821 (trainer:732) INFO: 81epoch:train:5371-5728batch: iter_time=0.006, forward_time=0.103, loss_ctc=33.620, loss_att=16.599, acc=0.843, loss=21.706, backward_time=0.051, grad_norm=188.146, clip=100.000, loss_scale=598.050, optim_step_time=0.032, optim0_lr0=2.278e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:51:05,969 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:51:43,261 (trainer:732) INFO: 81epoch:train:5729-6086batch: iter_time=0.006, forward_time=0.105, loss_ctc=33.603, loss_att=16.614, acc=0.845, loss=21.711, backward_time=0.051, grad_norm=189.752, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.277e-05, train_time=0.264 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:53:16,148 (trainer:732) INFO: 81epoch:train:6087-6444batch: iter_time=0.006, forward_time=0.103, loss_ctc=31.653, loss_att=15.588, acc=0.849, loss=20.408, backward_time=0.052, grad_norm=187.761, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.276e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:54:51,119 (trainer:732) INFO: 81epoch:train:6445-6802batch: iter_time=0.007, forward_time=0.106, loss_ctc=32.874, loss_att=16.127, acc=0.848, loss=21.151, backward_time=0.051, grad_norm=190.710, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.276e-05, train_time=0.265 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:56:23,998 (trainer:732) INFO: 81epoch:train:6803-7160batch: iter_time=0.005, forward_time=0.103, loss_ctc=32.908, loss_att=16.188, acc=0.847, loss=21.204, backward_time=0.051, grad_norm=191.144, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.275e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:57:32,892 (trainer:338) INFO: 81epoch results: [train] iter_time=0.005, forward_time=0.103, loss_ctc=32.324, loss_att=15.907, acc=0.848, loss=20.832, backward_time=0.051, grad_norm=188.423, clip=100.000, loss_scale=406.010, optim_step_time=0.033, optim0_lr0=2.282e-05, train_time=0.260, time=31 minutes and 0.5 seconds, total_count=580041, gpu_max_cached_mem_GB=26.869, [valid] loss_ctc=14.786, cer_ctc=0.076, loss_att=7.771, acc=0.925, cer=0.048, wer=0.663, loss=9.876, time=15.01 seconds, total_count=4293, gpu_max_cached_mem_GB=26.869, [att_plot] time=53.25 seconds, total_count=0, gpu_max_cached_mem_GB=26.869 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:57:36,587 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:57:36,619 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/71epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:57:36,619 (trainer:272) INFO: 82/100epoch started. Estimated time to finish: 10 hours, 7 minutes and 12.43 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:58:52,455 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:59:08,362 (trainer:732) INFO: 82epoch:train:1-358batch: iter_time=0.005, forward_time=0.098, loss_ctc=33.478, loss_att=16.486, acc=0.846, loss=21.584, backward_time=0.051, grad_norm=194.423, clip=100.000, loss_scale=544.986, optim_step_time=0.032, optim0_lr0=2.274e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 03:59:35,980 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:00:39,194 (trainer:732) INFO: 82epoch:train:359-716batch: iter_time=0.003, forward_time=0.098, loss_ctc=33.419, loss_att=16.416, acc=0.848, loss=21.517, backward_time=0.052, grad_norm=198.560, clip=100.000, loss_scale=332.728, optim_step_time=0.032, optim0_lr0=2.274e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:02:09,516 (trainer:732) INFO: 82epoch:train:717-1074batch: iter_time=0.002, forward_time=0.098, loss_ctc=32.062, loss_att=15.768, acc=0.849, loss=20.656, backward_time=0.052, grad_norm=189.445, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.273e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:03:39,338 (trainer:732) INFO: 82epoch:train:1075-1432batch: iter_time=5.778e-04, forward_time=0.098, loss_ctc=33.632, loss_att=16.523, acc=0.847, loss=21.655, backward_time=0.051, grad_norm=189.190, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.272e-05, train_time=0.251 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:05:09,514 (trainer:732) INFO: 82epoch:train:1433-1790batch: iter_time=0.004, forward_time=0.097, loss_ctc=30.526, loss_att=14.985, acc=0.852, loss=19.647, backward_time=0.052, grad_norm=182.536, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.272e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:06:02,249 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:06:41,215 (trainer:732) INFO: 82epoch:train:1791-2148batch: iter_time=0.005, forward_time=0.099, loss_ctc=33.204, loss_att=16.319, acc=0.846, loss=21.384, backward_time=0.051, grad_norm=193.275, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.271e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:08:12,887 (trainer:732) INFO: 82epoch:train:2149-2506batch: iter_time=0.005, forward_time=0.098, loss_ctc=32.793, loss_att=16.155, acc=0.847, loss=21.147, backward_time=0.051, grad_norm=195.444, clip=100.000, loss_scale=284.603, optim_step_time=0.032, optim0_lr0=2.270e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:09:44,329 (trainer:732) INFO: 82epoch:train:2507-2864batch: iter_time=0.007, forward_time=0.097, loss_ctc=31.437, loss_att=15.427, acc=0.850, loss=20.230, backward_time=0.051, grad_norm=186.640, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.269e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:11:14,633 (trainer:732) INFO: 82epoch:train:2865-3222batch: iter_time=0.004, forward_time=0.097, loss_ctc=32.529, loss_att=15.984, acc=0.847, loss=20.947, backward_time=0.052, grad_norm=192.020, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.269e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:12:46,314 (trainer:732) INFO: 82epoch:train:3223-3580batch: iter_time=0.011, forward_time=0.096, loss_ctc=30.241, loss_att=14.813, acc=0.850, loss=19.441, backward_time=0.051, grad_norm=188.530, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.268e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:14:18,206 (trainer:732) INFO: 82epoch:train:3581-3938batch: iter_time=0.008, forward_time=0.097, loss_ctc=31.702, loss_att=15.565, acc=0.849, loss=20.406, backward_time=0.051, grad_norm=185.915, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.267e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:15:50,345 (trainer:732) INFO: 82epoch:train:3939-4296batch: iter_time=0.008, forward_time=0.098, loss_ctc=32.441, loss_att=16.002, acc=0.849, loss=20.934, backward_time=0.051, grad_norm=189.960, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.267e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:17:06,399 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:17:21,023 (trainer:732) INFO: 82epoch:train:4297-4654batch: iter_time=0.008, forward_time=0.096, loss_ctc=30.693, loss_att=15.058, acc=0.852, loss=19.748, backward_time=0.051, grad_norm=185.087, clip=100.000, loss_scale=697.008, optim_step_time=0.033, optim0_lr0=2.266e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:18:53,910 (trainer:732) INFO: 82epoch:train:4655-5012batch: iter_time=0.009, forward_time=0.098, loss_ctc=32.454, loss_att=15.947, acc=0.849, loss=20.899, backward_time=0.052, grad_norm=186.903, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.265e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:20:25,658 (trainer:732) INFO: 82epoch:train:5013-5370batch: iter_time=0.007, forward_time=0.097, loss_ctc=32.285, loss_att=15.965, acc=0.847, loss=20.861, backward_time=0.052, grad_norm=188.895, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.265e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:21:57,459 (trainer:732) INFO: 82epoch:train:5371-5728batch: iter_time=0.010, forward_time=0.096, loss_ctc=30.858, loss_att=15.139, acc=0.851, loss=19.854, backward_time=0.051, grad_norm=187.707, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.264e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:23:30,666 (trainer:732) INFO: 82epoch:train:5729-6086batch: iter_time=0.011, forward_time=0.098, loss_ctc=31.728, loss_att=15.576, acc=0.848, loss=20.422, backward_time=0.052, grad_norm=187.377, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.263e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:23:33,614 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:25:02,261 (trainer:732) INFO: 82epoch:train:6087-6444batch: iter_time=0.007, forward_time=0.097, loss_ctc=33.255, loss_att=16.364, acc=0.848, loss=21.431, backward_time=0.052, grad_norm=192.633, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.263e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:25:41,780 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:26:25,838 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:26:34,291 (trainer:732) INFO: 82epoch:train:6445-6802batch: iter_time=0.007, forward_time=0.098, loss_ctc=32.423, loss_att=15.951, acc=0.849, loss=20.893, backward_time=0.052, grad_norm=187.796, clip=100.000, loss_scale=513.434, optim_step_time=0.033, optim0_lr0=2.262e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:28:08,275 (trainer:732) INFO: 82epoch:train:6803-7160batch: iter_time=0.012, forward_time=0.098, loss_ctc=32.699, loss_att=16.116, acc=0.849, loss=21.091, backward_time=0.051, grad_norm=189.806, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.261e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:29:16,564 (trainer:338) INFO: 82epoch results: [train] iter_time=0.007, forward_time=0.097, loss_ctc=32.176, loss_att=15.819, acc=0.849, loss=20.726, backward_time=0.051, grad_norm=189.598, clip=100.000, loss_scale=451.407, optim_step_time=0.033, optim0_lr0=2.268e-05, train_time=0.256, time=30 minutes and 32.31 seconds, total_count=587202, gpu_max_cached_mem_GB=26.869, [valid] loss_ctc=14.760, cer_ctc=0.076, loss_att=7.726, acc=0.925, cer=0.047, wer=0.666, loss=9.836, time=14.4 seconds, total_count=4346, gpu_max_cached_mem_GB=26.869, [att_plot] time=53.23 seconds, total_count=0, gpu_max_cached_mem_GB=26.869 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:29:20,105 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:29:20,133 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/72epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:29:20,133 (trainer:272) INFO: 83/100epoch started. Estimated time to finish: 9 hours, 34 minutes and 49.77 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:30:51,341 (trainer:732) INFO: 83epoch:train:1-358batch: iter_time=0.003, forward_time=0.101, loss_ctc=30.928, loss_att=15.180, acc=0.852, loss=19.905, backward_time=0.051, grad_norm=184.536, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.260e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:32:21,748 (trainer:732) INFO: 83epoch:train:359-716batch: iter_time=0.002, forward_time=0.099, loss_ctc=32.766, loss_att=16.071, acc=0.848, loss=21.080, backward_time=0.051, grad_norm=191.079, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.260e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:33:52,631 (trainer:732) INFO: 83epoch:train:717-1074batch: iter_time=0.003, forward_time=0.098, loss_ctc=33.389, loss_att=16.356, acc=0.848, loss=21.466, backward_time=0.051, grad_norm=191.661, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.259e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:35:23,245 (trainer:732) INFO: 83epoch:train:1075-1432batch: iter_time=0.004, forward_time=0.098, loss_ctc=32.326, loss_att=15.879, acc=0.850, loss=20.813, backward_time=0.051, grad_norm=188.534, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.258e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:35:37,571 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:36:52,488 (trainer:732) INFO: 83epoch:train:1433-1790batch: iter_time=0.003, forward_time=0.097, loss_ctc=32.223, loss_att=15.829, acc=0.849, loss=20.747, backward_time=0.051, grad_norm=188.561, clip=100.000, loss_scale=586.577, optim_step_time=0.032, optim0_lr0=2.258e-05, train_time=0.249 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:38:23,384 (trainer:732) INFO: 83epoch:train:1791-2148batch: iter_time=0.004, forward_time=0.098, loss_ctc=32.570, loss_att=15.997, acc=0.849, loss=20.969, backward_time=0.051, grad_norm=198.222, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.257e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:39:40,654 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:39:52,193 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:39:54,401 (trainer:732) INFO: 83epoch:train:2149-2506batch: iter_time=0.004, forward_time=0.098, loss_ctc=31.617, loss_att=15.523, acc=0.849, loss=20.351, backward_time=0.052, grad_norm=191.687, clip=100.000, loss_scale=472.560, optim_step_time=0.032, optim0_lr0=2.256e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:41:25,510 (trainer:732) INFO: 83epoch:train:2507-2864batch: iter_time=0.006, forward_time=0.097, loss_ctc=31.390, loss_att=15.403, acc=0.852, loss=20.199, backward_time=0.051, grad_norm=185.479, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.256e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:42:55,985 (trainer:732) INFO: 83epoch:train:2865-3222batch: iter_time=0.003, forward_time=0.097, loss_ctc=33.792, loss_att=16.628, acc=0.846, loss=21.778, backward_time=0.052, grad_norm=186.278, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.255e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:44:27,251 (trainer:732) INFO: 83epoch:train:3223-3580batch: iter_time=0.008, forward_time=0.097, loss_ctc=31.581, loss_att=15.481, acc=0.850, loss=20.311, backward_time=0.052, grad_norm=185.662, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.254e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:45:58,411 (trainer:732) INFO: 83epoch:train:3581-3938batch: iter_time=0.006, forward_time=0.097, loss_ctc=32.238, loss_att=15.875, acc=0.850, loss=20.784, backward_time=0.052, grad_norm=190.614, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.254e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:46:26,543 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:47:29,557 (trainer:732) INFO: 83epoch:train:3939-4296batch: iter_time=0.006, forward_time=0.097, loss_ctc=32.718, loss_att=16.125, acc=0.848, loss=21.103, backward_time=0.051, grad_norm=185.329, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.253e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:49:01,221 (trainer:732) INFO: 83epoch:train:4297-4654batch: iter_time=0.008, forward_time=0.097, loss_ctc=32.003, loss_att=15.709, acc=0.849, loss=20.597, backward_time=0.051, grad_norm=186.207, clip=100.000, loss_scale=401.162, optim_step_time=0.033, optim0_lr0=2.252e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:50:33,044 (trainer:732) INFO: 83epoch:train:4655-5012batch: iter_time=0.009, forward_time=0.097, loss_ctc=31.030, loss_att=15.260, acc=0.853, loss=19.991, backward_time=0.051, grad_norm=182.331, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.252e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:52:04,927 (trainer:732) INFO: 83epoch:train:5013-5370batch: iter_time=0.007, forward_time=0.098, loss_ctc=32.722, loss_att=16.171, acc=0.848, loss=21.136, backward_time=0.051, grad_norm=190.653, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.251e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:53:38,735 (trainer:732) INFO: 83epoch:train:5371-5728batch: iter_time=0.013, forward_time=0.098, loss_ctc=30.898, loss_att=15.155, acc=0.848, loss=19.878, backward_time=0.052, grad_norm=189.348, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.250e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:55:12,429 (trainer:732) INFO: 83epoch:train:5729-6086batch: iter_time=0.014, forward_time=0.097, loss_ctc=30.839, loss_att=15.171, acc=0.850, loss=19.871, backward_time=0.051, grad_norm=181.649, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.249e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:55:30,462 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:56:47,728 (trainer:732) INFO: 83epoch:train:6087-6444batch: iter_time=0.016, forward_time=0.098, loss_ctc=30.776, loss_att=15.141, acc=0.851, loss=19.831, backward_time=0.051, grad_norm=187.237, clip=100.000, loss_scale=303.328, optim_step_time=0.032, optim0_lr0=2.249e-05, train_time=0.266 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:57:00,644 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:58:20,501 (trainer:732) INFO: 83epoch:train:6445-6802batch: iter_time=0.009, forward_time=0.098, loss_ctc=33.086, loss_att=16.322, acc=0.850, loss=21.351, backward_time=0.051, grad_norm=193.549, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.248e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 04:59:53,206 (trainer:732) INFO: 83epoch:train:6803-7160batch: iter_time=0.009, forward_time=0.097, loss_ctc=32.674, loss_att=16.046, acc=0.850, loss=21.034, backward_time=0.051, grad_norm=191.751, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.247e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:01:01,698 (trainer:338) INFO: 83epoch results: [train] iter_time=0.007, forward_time=0.098, loss_ctc=32.059, loss_att=15.756, acc=0.849, loss=20.647, backward_time=0.051, grad_norm=188.520, clip=100.000, loss_scale=408.141, optim_step_time=0.033, optim0_lr0=2.254e-05, train_time=0.256, time=30 minutes and 33.73 seconds, total_count=594363, gpu_max_cached_mem_GB=26.869, [valid] loss_ctc=14.724, cer_ctc=0.076, loss_att=7.729, acc=0.925, cer=0.047, wer=0.669, loss=9.827, time=14.35 seconds, total_count=4399, gpu_max_cached_mem_GB=26.869, [att_plot] time=53.48 seconds, total_count=0, gpu_max_cached_mem_GB=26.869 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:01:05,269 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:01:05,285 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/73epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:01:05,286 (trainer:272) INFO: 84/100epoch started. Estimated time to finish: 9 hours, 2 minutes and 36.76 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:02:35,351 (trainer:732) INFO: 84epoch:train:1-358batch: iter_time=0.003, forward_time=0.098, loss_ctc=31.762, loss_att=15.612, acc=0.850, loss=20.457, backward_time=0.051, grad_norm=185.803, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.247e-05, train_time=0.251 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:04:05,888 (trainer:732) INFO: 84epoch:train:359-716batch: iter_time=0.002, forward_time=0.099, loss_ctc=31.784, loss_att=15.639, acc=0.850, loss=20.482, backward_time=0.051, grad_norm=184.394, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.246e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:05:36,463 (trainer:732) INFO: 84epoch:train:717-1074batch: iter_time=0.006, forward_time=0.097, loss_ctc=30.511, loss_att=14.928, acc=0.853, loss=19.603, backward_time=0.051, grad_norm=183.813, clip=100.000, loss_scale=314.637, optim_step_time=0.032, optim0_lr0=2.245e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:07:04,797 (trainer:732) INFO: 84epoch:train:1075-1432batch: iter_time=0.002, forward_time=0.096, loss_ctc=30.469, loss_att=15.019, acc=0.851, loss=19.654, backward_time=0.052, grad_norm=185.008, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.245e-05, train_time=0.246 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:08:35,971 (trainer:732) INFO: 84epoch:train:1433-1790batch: iter_time=0.002, forward_time=0.099, loss_ctc=34.045, loss_att=16.727, acc=0.847, loss=21.922, backward_time=0.051, grad_norm=192.736, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.244e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:09:26,777 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:10:06,272 (trainer:732) INFO: 84epoch:train:1791-2148batch: iter_time=0.002, forward_time=0.098, loss_ctc=33.162, loss_att=16.270, acc=0.850, loss=21.338, backward_time=0.051, grad_norm=189.016, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.243e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:11:36,784 (trainer:732) INFO: 84epoch:train:2149-2506batch: iter_time=0.005, forward_time=0.098, loss_ctc=30.454, loss_att=14.937, acc=0.853, loss=19.592, backward_time=0.051, grad_norm=184.128, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.243e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:12:51,571 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:13:09,288 (trainer:732) INFO: 84epoch:train:2507-2864batch: iter_time=0.003, forward_time=0.103, loss_ctc=31.904, loss_att=15.651, acc=0.851, loss=20.527, backward_time=0.052, grad_norm=191.549, clip=100.000, loss_scale=461.087, optim_step_time=0.032, optim0_lr0=2.242e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:14:43,342 (trainer:732) INFO: 84epoch:train:2865-3222batch: iter_time=0.010, forward_time=0.101, loss_ctc=31.987, loss_att=15.667, acc=0.849, loss=20.563, backward_time=0.052, grad_norm=192.741, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.241e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:16:17,266 (trainer:732) INFO: 84epoch:train:3223-3580batch: iter_time=0.009, forward_time=0.100, loss_ctc=31.511, loss_att=15.439, acc=0.851, loss=20.261, backward_time=0.052, grad_norm=182.613, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.241e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:16:32,291 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:17:49,503 (trainer:732) INFO: 84epoch:train:3581-3938batch: iter_time=0.007, forward_time=0.098, loss_ctc=31.920, loss_att=15.715, acc=0.850, loss=20.577, backward_time=0.051, grad_norm=189.619, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.240e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:19:23,093 (trainer:732) INFO: 84epoch:train:3939-4296batch: iter_time=0.009, forward_time=0.100, loss_ctc=31.871, loss_att=15.641, acc=0.849, loss=20.510, backward_time=0.052, grad_norm=191.670, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.239e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:20:55,911 (trainer:732) INFO: 84epoch:train:4297-4654batch: iter_time=0.011, forward_time=0.098, loss_ctc=32.031, loss_att=15.713, acc=0.850, loss=20.608, backward_time=0.051, grad_norm=187.144, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.239e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:21:09,004 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:22:26,922 (trainer:732) INFO: 84epoch:train:4655-5012batch: iter_time=0.006, forward_time=0.097, loss_ctc=32.700, loss_att=16.041, acc=0.848, loss=21.039, backward_time=0.051, grad_norm=191.771, clip=100.000, loss_scale=412.603, optim_step_time=0.032, optim0_lr0=2.238e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:23:59,958 (trainer:732) INFO: 84epoch:train:5013-5370batch: iter_time=0.007, forward_time=0.100, loss_ctc=32.751, loss_att=16.112, acc=0.851, loss=21.104, backward_time=0.051, grad_norm=190.385, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.237e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:25:32,444 (trainer:732) INFO: 84epoch:train:5371-5728batch: iter_time=0.007, forward_time=0.099, loss_ctc=33.394, loss_att=16.419, acc=0.846, loss=21.511, backward_time=0.051, grad_norm=190.476, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.237e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:25:53,239 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:27:04,867 (trainer:732) INFO: 84epoch:train:5729-6086batch: iter_time=0.007, forward_time=0.099, loss_ctc=33.007, loss_att=16.248, acc=0.849, loss=21.276, backward_time=0.051, grad_norm=200.547, clip=100.000, loss_scale=312.650, optim_step_time=0.032, optim0_lr0=2.236e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:28:38,978 (trainer:732) INFO: 84epoch:train:6087-6444batch: iter_time=0.013, forward_time=0.098, loss_ctc=31.302, loss_att=15.352, acc=0.853, loss=20.137, backward_time=0.051, grad_norm=180.895, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.235e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:30:12,685 (trainer:732) INFO: 84epoch:train:6445-6802batch: iter_time=0.011, forward_time=0.099, loss_ctc=31.423, loss_att=15.419, acc=0.849, loss=20.220, backward_time=0.051, grad_norm=189.896, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.235e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:31:45,573 (trainer:732) INFO: 84epoch:train:6803-7160batch: iter_time=0.012, forward_time=0.097, loss_ctc=31.288, loss_att=15.396, acc=0.850, loss=20.164, backward_time=0.052, grad_norm=186.043, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.234e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:32:54,184 (trainer:338) INFO: 84epoch results: [train] iter_time=0.007, forward_time=0.099, loss_ctc=31.944, loss_att=15.687, acc=0.850, loss=20.564, backward_time=0.051, grad_norm=188.503, clip=100.000, loss_scale=356.626, optim_step_time=0.033, optim0_lr0=2.240e-05, train_time=0.257, time=30 minutes and 40.9 seconds, total_count=601524, gpu_max_cached_mem_GB=26.869, [valid] loss_ctc=14.512, cer_ctc=0.074, loss_att=7.633, acc=0.926, cer=0.047, wer=0.659, loss=9.696, time=14.66 seconds, total_count=4452, gpu_max_cached_mem_GB=26.869, [att_plot] time=53.33 seconds, total_count=0, gpu_max_cached_mem_GB=26.869 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:32:57,945 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:32:57,978 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/74epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:32:57,978 (trainer:272) INFO: 85/100epoch started. Estimated time to finish: 8 hours, 30 minutes and 38.44 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:34:28,571 (trainer:732) INFO: 85epoch:train:1-358batch: iter_time=0.005, forward_time=0.098, loss_ctc=32.255, loss_att=15.847, acc=0.849, loss=20.770, backward_time=0.051, grad_norm=194.811, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.233e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:35:08,599 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:35:58,331 (trainer:732) INFO: 85epoch:train:359-716batch: iter_time=0.002, forward_time=0.099, loss_ctc=30.439, loss_att=14.892, acc=0.853, loss=19.556, backward_time=0.051, grad_norm=190.828, clip=100.000, loss_scale=305.341, optim_step_time=0.032, optim0_lr0=2.233e-05, train_time=0.250 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:37:29,305 (trainer:732) INFO: 85epoch:train:717-1074batch: iter_time=0.004, forward_time=0.099, loss_ctc=30.583, loss_att=14.991, acc=0.854, loss=19.668, backward_time=0.051, grad_norm=181.762, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.232e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:37:45,900 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:38:58,908 (trainer:732) INFO: 85epoch:train:1075-1432batch: iter_time=5.143e-04, forward_time=0.098, loss_ctc=33.139, loss_att=16.221, acc=0.848, loss=21.296, backward_time=0.051, grad_norm=195.659, clip=100.000, loss_scale=302.611, optim_step_time=0.033, optim0_lr0=2.231e-05, train_time=0.250 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:40:30,798 (trainer:732) INFO: 85epoch:train:1433-1790batch: iter_time=0.004, forward_time=0.100, loss_ctc=33.016, loss_att=16.224, acc=0.848, loss=21.262, backward_time=0.051, grad_norm=193.043, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.231e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:42:02,418 (trainer:732) INFO: 85epoch:train:1791-2148batch: iter_time=0.003, forward_time=0.100, loss_ctc=32.504, loss_att=15.990, acc=0.851, loss=20.944, backward_time=0.051, grad_norm=195.973, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.230e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:43:33,003 (trainer:732) INFO: 85epoch:train:2149-2506batch: iter_time=0.003, forward_time=0.099, loss_ctc=31.500, loss_att=15.439, acc=0.849, loss=20.257, backward_time=0.051, grad_norm=189.915, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.229e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:45:05,906 (trainer:732) INFO: 85epoch:train:2507-2864batch: iter_time=0.009, forward_time=0.099, loss_ctc=31.483, loss_att=15.390, acc=0.852, loss=20.218, backward_time=0.051, grad_norm=187.632, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.229e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:46:35,967 (trainer:732) INFO: 85epoch:train:2865-3222batch: iter_time=0.003, forward_time=0.098, loss_ctc=31.493, loss_att=15.477, acc=0.850, loss=20.282, backward_time=0.052, grad_norm=186.877, clip=100.000, loss_scale=314.637, optim_step_time=0.032, optim0_lr0=2.228e-05, train_time=0.251 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:48:09,085 (trainer:732) INFO: 85epoch:train:3223-3580batch: iter_time=0.010, forward_time=0.099, loss_ctc=30.078, loss_att=14.751, acc=0.855, loss=19.349, backward_time=0.051, grad_norm=183.040, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.227e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:49:41,457 (trainer:732) INFO: 85epoch:train:3581-3938batch: iter_time=0.006, forward_time=0.100, loss_ctc=31.761, loss_att=15.597, acc=0.852, loss=20.446, backward_time=0.051, grad_norm=190.056, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.227e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:51:12,948 (trainer:732) INFO: 85epoch:train:3939-4296batch: iter_time=0.006, forward_time=0.099, loss_ctc=31.588, loss_att=15.504, acc=0.851, loss=20.329, backward_time=0.051, grad_norm=188.251, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.226e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:52:10,782 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:52:43,103 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:52:44,399 (trainer:732) INFO: 85epoch:train:4297-4654batch: iter_time=0.003, forward_time=0.100, loss_ctc=33.022, loss_att=16.232, acc=0.850, loss=21.269, backward_time=0.051, grad_norm=189.476, clip=100.000, loss_scale=418.062, optim_step_time=0.032, optim0_lr0=2.225e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:54:16,927 (trainer:732) INFO: 85epoch:train:4655-5012batch: iter_time=0.007, forward_time=0.099, loss_ctc=31.268, loss_att=15.430, acc=0.853, loss=20.182, backward_time=0.052, grad_norm=187.412, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.225e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:55:49,540 (trainer:732) INFO: 85epoch:train:5013-5370batch: iter_time=0.006, forward_time=0.100, loss_ctc=31.873, loss_att=15.682, acc=0.850, loss=20.539, backward_time=0.052, grad_norm=185.414, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.224e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:57:21,521 (trainer:732) INFO: 85epoch:train:5371-5728batch: iter_time=0.006, forward_time=0.099, loss_ctc=32.317, loss_att=15.880, acc=0.849, loss=20.811, backward_time=0.052, grad_norm=183.629, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.223e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 05:58:54,995 (trainer:732) INFO: 85epoch:train:5729-6086batch: iter_time=0.010, forward_time=0.099, loss_ctc=31.743, loss_att=15.598, acc=0.850, loss=20.441, backward_time=0.052, grad_norm=184.291, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.223e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:00:27,312 (trainer:732) INFO: 85epoch:train:6087-6444batch: iter_time=0.007, forward_time=0.098, loss_ctc=33.314, loss_att=16.350, acc=0.848, loss=21.439, backward_time=0.051, grad_norm=190.966, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.222e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:01:38,990 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:02:00,505 (trainer:732) INFO: 85epoch:train:6445-6802batch: iter_time=0.011, forward_time=0.097, loss_ctc=32.367, loss_att=15.888, acc=0.848, loss=20.832, backward_time=0.051, grad_norm=189.224, clip=100.000, loss_scale=455.508, optim_step_time=0.032, optim0_lr0=2.222e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:03:34,168 (trainer:732) INFO: 85epoch:train:6803-7160batch: iter_time=0.011, forward_time=0.098, loss_ctc=31.840, loss_att=15.682, acc=0.849, loss=20.529, backward_time=0.052, grad_norm=195.240, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.221e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:04:42,544 (trainer:338) INFO: 85epoch results: [train] iter_time=0.006, forward_time=0.099, loss_ctc=31.864, loss_att=15.646, acc=0.850, loss=20.511, backward_time=0.051, grad_norm=189.174, clip=100.000, loss_scale=345.827, optim_step_time=0.033, optim0_lr0=2.227e-05, train_time=0.256, time=30 minutes and 36.85 seconds, total_count=608685, gpu_max_cached_mem_GB=26.869, [valid] loss_ctc=14.533, cer_ctc=0.075, loss_att=7.659, acc=0.925, cer=0.047, wer=0.664, loss=9.721, time=14.43 seconds, total_count=4505, gpu_max_cached_mem_GB=26.869, [att_plot] time=53.28 seconds, total_count=0, gpu_max_cached_mem_GB=26.869 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:04:46,263 (trainer:384) INFO: There are no improvements in this epoch [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:04:46,285 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/76epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:04:46,285 (trainer:272) INFO: 86/100epoch started. Estimated time to finish: 7 hours, 58 minutes and 35.93 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:06:01,061 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:06:17,703 (trainer:732) INFO: 86epoch:train:1-358batch: iter_time=0.002, forward_time=0.102, loss_ctc=31.946, loss_att=15.700, acc=0.852, loss=20.573, backward_time=0.051, grad_norm=189.708, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.220e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:07:48,725 (trainer:732) INFO: 86epoch:train:359-716batch: iter_time=0.001, forward_time=0.101, loss_ctc=31.991, loss_att=15.675, acc=0.853, loss=20.570, backward_time=0.052, grad_norm=189.671, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.220e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:08:45,657 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:09:20,729 (trainer:732) INFO: 86epoch:train:717-1074batch: iter_time=0.003, forward_time=0.101, loss_ctc=31.925, loss_att=15.699, acc=0.851, loss=20.567, backward_time=0.051, grad_norm=190.962, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.219e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:10:40,499 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:10:51,279 (trainer:732) INFO: 86epoch:train:1075-1432batch: iter_time=0.003, forward_time=0.099, loss_ctc=31.407, loss_att=15.416, acc=0.850, loss=20.213, backward_time=0.052, grad_norm=184.107, clip=100.000, loss_scale=549.289, optim_step_time=0.032, optim0_lr0=2.218e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:12:21,756 (trainer:732) INFO: 86epoch:train:1433-1790batch: iter_time=0.003, forward_time=0.098, loss_ctc=32.200, loss_att=15.790, acc=0.851, loss=20.713, backward_time=0.051, grad_norm=185.558, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.218e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:13:51,002 (trainer:732) INFO: 86epoch:train:1791-2148batch: iter_time=0.003, forward_time=0.097, loss_ctc=30.988, loss_att=15.137, acc=0.852, loss=19.892, backward_time=0.051, grad_norm=180.921, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.217e-05, train_time=0.249 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:15:20,969 (trainer:732) INFO: 86epoch:train:2149-2506batch: iter_time=0.004, forward_time=0.097, loss_ctc=31.727, loss_att=15.567, acc=0.851, loss=20.415, backward_time=0.051, grad_norm=185.625, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.216e-05, train_time=0.251 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:16:10,927 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:16:52,388 (trainer:732) INFO: 86epoch:train:2507-2864batch: iter_time=0.006, forward_time=0.098, loss_ctc=31.354, loss_att=15.413, acc=0.852, loss=20.195, backward_time=0.051, grad_norm=189.551, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.216e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:18:23,943 (trainer:732) INFO: 86epoch:train:2865-3222batch: iter_time=0.007, forward_time=0.098, loss_ctc=30.475, loss_att=14.940, acc=0.853, loss=19.600, backward_time=0.051, grad_norm=182.516, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.215e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:19:18,630 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:19:55,945 (trainer:732) INFO: 86epoch:train:3223-3580batch: iter_time=0.005, forward_time=0.099, loss_ctc=32.153, loss_att=15.706, acc=0.854, loss=20.640, backward_time=0.052, grad_norm=186.723, clip=100.000, loss_scale=577.972, optim_step_time=0.033, optim0_lr0=2.214e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:21:25,804 (trainer:732) INFO: 86epoch:train:3581-3938batch: iter_time=0.005, forward_time=0.097, loss_ctc=31.351, loss_att=15.344, acc=0.853, loss=20.146, backward_time=0.051, grad_norm=187.499, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.214e-05, train_time=0.251 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:22:58,669 (trainer:732) INFO: 86epoch:train:3939-4296batch: iter_time=0.008, forward_time=0.099, loss_ctc=30.334, loss_att=14.857, acc=0.854, loss=19.500, backward_time=0.052, grad_norm=187.835, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.213e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:24:31,371 (trainer:732) INFO: 86epoch:train:4297-4654batch: iter_time=0.007, forward_time=0.099, loss_ctc=32.646, loss_att=16.071, acc=0.848, loss=21.043, backward_time=0.051, grad_norm=198.066, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.212e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:25:44,224 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:26:04,573 (trainer:732) INFO: 86epoch:train:4655-5012batch: iter_time=0.005, forward_time=0.104, loss_ctc=31.555, loss_att=15.486, acc=0.853, loss=20.307, backward_time=0.051, grad_norm=187.884, clip=100.000, loss_scale=453.916, optim_step_time=0.032, optim0_lr0=2.212e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:27:36,796 (trainer:732) INFO: 86epoch:train:5013-5370batch: iter_time=0.007, forward_time=0.098, loss_ctc=32.683, loss_att=16.071, acc=0.847, loss=21.055, backward_time=0.052, grad_norm=194.176, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.211e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:29:10,619 (trainer:732) INFO: 86epoch:train:5371-5728batch: iter_time=0.010, forward_time=0.099, loss_ctc=32.274, loss_att=15.848, acc=0.852, loss=20.776, backward_time=0.051, grad_norm=194.869, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.210e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:30:46,230 (trainer:732) INFO: 86epoch:train:5729-6086batch: iter_time=0.012, forward_time=0.101, loss_ctc=32.345, loss_att=15.858, acc=0.851, loss=20.804, backward_time=0.051, grad_norm=194.577, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.210e-05, train_time=0.267 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:32:18,978 (trainer:732) INFO: 86epoch:train:6087-6444batch: iter_time=0.007, forward_time=0.101, loss_ctc=31.482, loss_att=15.527, acc=0.850, loss=20.313, backward_time=0.052, grad_norm=187.142, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.209e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:33:51,389 (trainer:732) INFO: 86epoch:train:6445-6802batch: iter_time=0.011, forward_time=0.097, loss_ctc=30.988, loss_att=15.232, acc=0.851, loss=19.958, backward_time=0.051, grad_norm=184.984, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.209e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:35:24,205 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:35:24,226 (trainer:732) INFO: 86epoch:train:6803-7160batch: iter_time=0.008, forward_time=0.099, loss_ctc=31.981, loss_att=15.695, acc=0.849, loss=20.581, backward_time=0.052, grad_norm=195.667, clip=100.000, loss_scale=419.496, optim_step_time=0.033, optim0_lr0=2.208e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:36:33,130 (trainer:338) INFO: 86epoch results: [train] iter_time=0.006, forward_time=0.099, loss_ctc=31.685, loss_att=15.549, acc=0.851, loss=20.390, backward_time=0.051, grad_norm=188.900, clip=100.000, loss_scale=445.577, optim_step_time=0.033, optim0_lr0=2.214e-05, train_time=0.256, time=30 minutes and 38.63 seconds, total_count=615846, gpu_max_cached_mem_GB=26.869, [valid] loss_ctc=14.481, cer_ctc=0.075, loss_att=7.630, acc=0.925, cer=0.047, wer=0.662, loss=9.686, time=14.85 seconds, total_count=4558, gpu_max_cached_mem_GB=26.869, [att_plot] time=53.37 seconds, total_count=0, gpu_max_cached_mem_GB=26.869 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:36:36,909 (trainer:384) INFO: There are no improvements in this epoch [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:36:36,943 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/75epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:36:36,943 (trainer:272) INFO: 87/100epoch started. Estimated time to finish: 7 hours, 26 minutes and 37.79 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:38:07,575 (trainer:732) INFO: 87epoch:train:1-358batch: iter_time=0.003, forward_time=0.098, loss_ctc=32.747, loss_att=16.034, acc=0.851, loss=21.048, backward_time=0.051, grad_norm=196.729, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.207e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:39:37,764 (trainer:732) INFO: 87epoch:train:359-716batch: iter_time=0.002, forward_time=0.098, loss_ctc=32.070, loss_att=15.719, acc=0.851, loss=20.625, backward_time=0.051, grad_norm=188.206, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.207e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:41:06,974 (trainer:732) INFO: 87epoch:train:717-1074batch: iter_time=0.002, forward_time=0.097, loss_ctc=30.528, loss_att=14.862, acc=0.854, loss=19.562, backward_time=0.051, grad_norm=189.690, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.206e-05, train_time=0.249 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:42:36,747 (trainer:732) INFO: 87epoch:train:1075-1432batch: iter_time=0.002, forward_time=0.097, loss_ctc=32.746, loss_att=16.116, acc=0.852, loss=21.105, backward_time=0.051, grad_norm=191.343, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.205e-05, train_time=0.251 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:42:40,886 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:44:07,080 (trainer:732) INFO: 87epoch:train:1433-1790batch: iter_time=0.002, forward_time=0.098, loss_ctc=32.459, loss_att=15.924, acc=0.850, loss=20.884, backward_time=0.051, grad_norm=186.556, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.205e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:45:37,522 (trainer:732) INFO: 87epoch:train:1791-2148batch: iter_time=0.003, forward_time=0.098, loss_ctc=31.384, loss_att=15.364, acc=0.854, loss=20.170, backward_time=0.051, grad_norm=190.857, clip=100.000, loss_scale=362.547, optim_step_time=0.033, optim0_lr0=2.204e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:47:00,828 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:47:07,393 (trainer:732) INFO: 87epoch:train:2149-2506batch: iter_time=0.004, forward_time=0.097, loss_ctc=29.839, loss_att=14.605, acc=0.857, loss=19.175, backward_time=0.051, grad_norm=187.078, clip=100.000, loss_scale=492.639, optim_step_time=0.032, optim0_lr0=2.203e-05, train_time=0.251 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:48:37,137 (trainer:732) INFO: 87epoch:train:2507-2864batch: iter_time=0.003, forward_time=0.097, loss_ctc=32.491, loss_att=15.974, acc=0.850, loss=20.929, backward_time=0.051, grad_norm=192.216, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.203e-05, train_time=0.250 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:50:09,589 (trainer:732) INFO: 87epoch:train:2865-3222batch: iter_time=0.008, forward_time=0.098, loss_ctc=30.489, loss_att=14.964, acc=0.854, loss=19.622, backward_time=0.051, grad_norm=186.371, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.202e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:51:40,374 (trainer:732) INFO: 87epoch:train:3223-3580batch: iter_time=0.005, forward_time=0.097, loss_ctc=32.605, loss_att=15.971, acc=0.851, loss=20.961, backward_time=0.051, grad_norm=189.607, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.202e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:53:11,670 (trainer:732) INFO: 87epoch:train:3581-3938batch: iter_time=0.005, forward_time=0.098, loss_ctc=31.533, loss_att=15.486, acc=0.851, loss=20.300, backward_time=0.051, grad_norm=191.091, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.201e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:54:42,403 (trainer:732) INFO: 87epoch:train:3939-4296batch: iter_time=0.003, forward_time=0.099, loss_ctc=32.927, loss_att=16.118, acc=0.850, loss=21.161, backward_time=0.051, grad_norm=193.574, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.200e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:56:14,657 (trainer:732) INFO: 87epoch:train:4297-4654batch: iter_time=0.008, forward_time=0.098, loss_ctc=31.182, loss_att=15.360, acc=0.851, loss=20.106, backward_time=0.051, grad_norm=188.800, clip=100.000, loss_scale=381.140, optim_step_time=0.032, optim0_lr0=2.200e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:57:46,323 (trainer:732) INFO: 87epoch:train:4655-5012batch: iter_time=0.010, forward_time=0.097, loss_ctc=30.688, loss_att=15.030, acc=0.853, loss=19.728, backward_time=0.051, grad_norm=185.025, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.199e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 06:59:17,843 (trainer:732) INFO: 87epoch:train:5013-5370batch: iter_time=0.004, forward_time=0.099, loss_ctc=31.456, loss_att=15.416, acc=0.851, loss=20.228, backward_time=0.052, grad_norm=188.144, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.198e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:00:49,137 (trainer:732) INFO: 87epoch:train:5371-5728batch: iter_time=0.006, forward_time=0.098, loss_ctc=30.557, loss_att=14.960, acc=0.854, loss=19.639, backward_time=0.052, grad_norm=185.151, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.198e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:01:22,102 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:02:22,331 (trainer:732) INFO: 87epoch:train:5729-6086batch: iter_time=0.009, forward_time=0.099, loss_ctc=31.164, loss_att=15.340, acc=0.851, loss=20.087, backward_time=0.052, grad_norm=192.033, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.197e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:03:56,233 (trainer:732) INFO: 87epoch:train:6087-6444batch: iter_time=0.010, forward_time=0.099, loss_ctc=31.926, loss_att=15.673, acc=0.852, loss=20.549, backward_time=0.052, grad_norm=189.212, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.196e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:04:06,510 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:04:38,024 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:05:28,623 (trainer:732) INFO: 87epoch:train:6445-6802batch: iter_time=0.010, forward_time=0.098, loss_ctc=30.848, loss_att=15.153, acc=0.853, loss=19.862, backward_time=0.051, grad_norm=185.567, clip=100.000, loss_scale=520.605, optim_step_time=0.033, optim0_lr0=2.196e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:07:01,864 (trainer:732) INFO: 87epoch:train:6803-7160batch: iter_time=0.008, forward_time=0.099, loss_ctc=31.831, loss_att=15.636, acc=0.851, loss=20.495, backward_time=0.052, grad_norm=185.764, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.195e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:08:10,566 (trainer:338) INFO: 87epoch results: [train] iter_time=0.005, forward_time=0.098, loss_ctc=31.555, loss_att=15.476, acc=0.852, loss=20.300, backward_time=0.051, grad_norm=189.146, clip=100.000, loss_scale=369.428, optim_step_time=0.033, optim0_lr0=2.201e-05, train_time=0.255, time=30 minutes and 25.6 seconds, total_count=623007, gpu_max_cached_mem_GB=26.869, [valid] loss_ctc=14.596, cer_ctc=0.075, loss_att=7.652, acc=0.925, cer=0.047, wer=0.655, loss=9.736, time=14.49 seconds, total_count=4611, gpu_max_cached_mem_GB=26.869, [att_plot] time=53.53 seconds, total_count=0, gpu_max_cached_mem_GB=26.869 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:08:14,185 (trainer:384) INFO: There are no improvements in this epoch [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:08:14,217 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/77epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:08:14,217 (trainer:272) INFO: 88/100epoch started. Estimated time to finish: 6 hours, 54 minutes and 29.06 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:09:45,070 (trainer:732) INFO: 88epoch:train:1-358batch: iter_time=0.003, forward_time=0.101, loss_ctc=30.773, loss_att=15.020, acc=0.854, loss=19.746, backward_time=0.051, grad_norm=187.617, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.195e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:11:16,863 (trainer:732) INFO: 88epoch:train:359-716batch: iter_time=0.003, forward_time=0.102, loss_ctc=30.684, loss_att=14.900, acc=0.856, loss=19.635, backward_time=0.051, grad_norm=184.133, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.194e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:12:46,311 (trainer:732) INFO: 88epoch:train:717-1074batch: iter_time=0.003, forward_time=0.097, loss_ctc=31.029, loss_att=15.211, acc=0.854, loss=19.956, backward_time=0.051, grad_norm=189.989, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.193e-05, train_time=0.250 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:14:09,240 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:14:16,998 (trainer:732) INFO: 88epoch:train:1075-1432batch: iter_time=0.005, forward_time=0.098, loss_ctc=30.181, loss_att=14.806, acc=0.855, loss=19.419, backward_time=0.052, grad_norm=185.452, clip=100.000, loss_scale=619.563, optim_step_time=0.033, optim0_lr0=2.193e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:14:24,317 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:15:46,868 (trainer:732) INFO: 88epoch:train:1433-1790batch: iter_time=0.001, forward_time=0.098, loss_ctc=32.743, loss_att=16.061, acc=0.853, loss=21.065, backward_time=0.051, grad_norm=194.064, clip=100.000, loss_scale=276.796, optim_step_time=0.032, optim0_lr0=2.192e-05, train_time=0.251 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:17:18,328 (trainer:732) INFO: 88epoch:train:1791-2148batch: iter_time=0.004, forward_time=0.100, loss_ctc=31.045, loss_att=15.235, acc=0.853, loss=19.978, backward_time=0.052, grad_norm=188.399, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.191e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:18:49,726 (trainer:732) INFO: 88epoch:train:2149-2506batch: iter_time=0.004, forward_time=0.099, loss_ctc=31.036, loss_att=15.229, acc=0.853, loss=19.971, backward_time=0.051, grad_norm=182.457, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.191e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:20:20,400 (trainer:732) INFO: 88epoch:train:2507-2864batch: iter_time=0.004, forward_time=0.098, loss_ctc=30.705, loss_att=15.077, acc=0.854, loss=19.765, backward_time=0.051, grad_norm=187.304, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.190e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:21:50,612 (trainer:732) INFO: 88epoch:train:2865-3222batch: iter_time=0.004, forward_time=0.098, loss_ctc=31.855, loss_att=15.617, acc=0.850, loss=20.489, backward_time=0.052, grad_norm=191.194, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.190e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:22:37,066 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:22:46,758 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:23:22,379 (trainer:732) INFO: 88epoch:train:3223-3580batch: iter_time=0.006, forward_time=0.099, loss_ctc=30.821, loss_att=15.095, acc=0.854, loss=19.812, backward_time=0.051, grad_norm=181.335, clip=100.000, loss_scale=340.380, optim_step_time=0.032, optim0_lr0=2.189e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:24:53,417 (trainer:732) INFO: 88epoch:train:3581-3938batch: iter_time=0.004, forward_time=0.098, loss_ctc=31.891, loss_att=15.657, acc=0.851, loss=20.527, backward_time=0.052, grad_norm=182.408, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.188e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:26:25,770 (trainer:732) INFO: 88epoch:train:3939-4296batch: iter_time=0.007, forward_time=0.099, loss_ctc=31.415, loss_att=15.394, acc=0.852, loss=20.200, backward_time=0.052, grad_norm=179.830, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.188e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:27:57,826 (trainer:732) INFO: 88epoch:train:4297-4654batch: iter_time=0.006, forward_time=0.099, loss_ctc=32.060, loss_att=15.784, acc=0.850, loss=20.667, backward_time=0.052, grad_norm=186.984, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.187e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:29:29,036 (trainer:732) INFO: 88epoch:train:4655-5012batch: iter_time=0.004, forward_time=0.098, loss_ctc=32.196, loss_att=15.744, acc=0.853, loss=20.680, backward_time=0.051, grad_norm=189.278, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.186e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:31:00,283 (trainer:732) INFO: 88epoch:train:5013-5370batch: iter_time=0.004, forward_time=0.099, loss_ctc=32.887, loss_att=16.106, acc=0.850, loss=21.140, backward_time=0.052, grad_norm=193.606, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.186e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:31:42,218 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:32:33,251 (trainer:732) INFO: 88epoch:train:5371-5728batch: iter_time=0.007, forward_time=0.099, loss_ctc=31.414, loss_att=15.398, acc=0.854, loss=20.203, backward_time=0.052, grad_norm=186.435, clip=100.000, loss_scale=609.524, optim_step_time=0.032, optim0_lr0=2.185e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:33:12,380 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:33:53,879 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:34:06,380 (trainer:732) INFO: 88epoch:train:5729-6086batch: iter_time=0.008, forward_time=0.099, loss_ctc=32.436, loss_att=15.913, acc=0.850, loss=20.870, backward_time=0.052, grad_norm=197.289, clip=100.000, loss_scale=478.297, optim_step_time=0.033, optim0_lr0=2.185e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:35:39,974 (trainer:732) INFO: 88epoch:train:6087-6444batch: iter_time=0.009, forward_time=0.099, loss_ctc=31.219, loss_att=15.338, acc=0.854, loss=20.102, backward_time=0.051, grad_norm=191.695, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.184e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:37:11,219 (trainer:732) INFO: 88epoch:train:6445-6802batch: iter_time=0.008, forward_time=0.097, loss_ctc=31.366, loss_att=15.393, acc=0.851, loss=20.185, backward_time=0.051, grad_norm=186.960, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.183e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:38:43,353 (trainer:732) INFO: 88epoch:train:6803-7160batch: iter_time=0.008, forward_time=0.097, loss_ctc=31.420, loss_att=15.486, acc=0.852, loss=20.266, backward_time=0.051, grad_norm=188.314, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.183e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:39:51,964 (trainer:338) INFO: 88epoch results: [train] iter_time=0.005, forward_time=0.099, loss_ctc=31.449, loss_att=15.418, acc=0.853, loss=20.227, backward_time=0.051, grad_norm=187.777, clip=100.000, loss_scale=410.559, optim_step_time=0.033, optim0_lr0=2.189e-05, train_time=0.255, time=30 minutes and 29.8 seconds, total_count=630168, gpu_max_cached_mem_GB=26.869, [valid] loss_ctc=14.384, cer_ctc=0.075, loss_att=7.565, acc=0.926, cer=0.047, wer=0.654, loss=9.611, time=14.46 seconds, total_count=4664, gpu_max_cached_mem_GB=26.869, [att_plot] time=53.49 seconds, total_count=0, gpu_max_cached_mem_GB=26.869 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:39:55,571 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:39:55,603 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/79epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:39:55,603 (trainer:272) INFO: 89/100epoch started. Estimated time to finish: 6 hours, 22 minutes and 27.34 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:41:28,126 (trainer:732) INFO: 89epoch:train:1-358batch: iter_time=0.004, forward_time=0.103, loss_ctc=32.027, loss_att=15.655, acc=0.852, loss=20.567, backward_time=0.051, grad_norm=187.257, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.182e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:42:06,912 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:42:59,418 (trainer:732) INFO: 89epoch:train:359-716batch: iter_time=5.641e-04, forward_time=0.103, loss_ctc=31.682, loss_att=15.551, acc=0.855, loss=20.390, backward_time=0.051, grad_norm=193.395, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.181e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:44:30,955 (trainer:732) INFO: 89epoch:train:717-1074batch: iter_time=9.038e-04, forward_time=0.104, loss_ctc=31.825, loss_att=15.576, acc=0.852, loss=20.451, backward_time=0.051, grad_norm=187.842, clip=100.000, loss_scale=396.156, optim_step_time=0.032, optim0_lr0=2.181e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:45:45,475 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:46:03,672 (trainer:732) INFO: 89epoch:train:1075-1432batch: iter_time=1.903e-04, forward_time=0.105, loss_ctc=33.273, loss_att=16.229, acc=0.852, loss=21.342, backward_time=0.052, grad_norm=199.398, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.180e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:47:34,515 (trainer:732) INFO: 89epoch:train:1433-1790batch: iter_time=0.002, forward_time=0.102, loss_ctc=30.285, loss_att=14.853, acc=0.853, loss=19.482, backward_time=0.051, grad_norm=188.724, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.180e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:49:08,142 (trainer:732) INFO: 89epoch:train:1791-2148batch: iter_time=0.004, forward_time=0.105, loss_ctc=30.686, loss_att=15.040, acc=0.855, loss=19.734, backward_time=0.052, grad_norm=184.324, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.179e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:50:40,189 (trainer:732) INFO: 89epoch:train:2149-2506batch: iter_time=0.001, forward_time=0.104, loss_ctc=31.421, loss_att=15.386, acc=0.854, loss=20.196, backward_time=0.052, grad_norm=185.111, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.178e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:52:13,968 (trainer:732) INFO: 89epoch:train:2507-2864batch: iter_time=0.004, forward_time=0.105, loss_ctc=30.699, loss_att=15.106, acc=0.854, loss=19.784, backward_time=0.052, grad_norm=189.982, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.178e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:53:08,797 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:53:46,064 (trainer:732) INFO: 89epoch:train:2865-3222batch: iter_time=0.003, forward_time=0.103, loss_ctc=31.429, loss_att=15.394, acc=0.856, loss=20.205, backward_time=0.052, grad_norm=184.390, clip=100.000, loss_scale=793.098, optim_step_time=0.033, optim0_lr0=2.177e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:54:34,163 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:55:18,437 (trainer:732) INFO: 89epoch:train:3223-3580batch: iter_time=0.006, forward_time=0.101, loss_ctc=31.514, loss_att=15.445, acc=0.853, loss=20.266, backward_time=0.052, grad_norm=189.385, clip=100.000, loss_scale=390.095, optim_step_time=0.032, optim0_lr0=2.176e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:56:51,499 (trainer:732) INFO: 89epoch:train:3581-3938batch: iter_time=0.004, forward_time=0.104, loss_ctc=31.673, loss_att=15.525, acc=0.852, loss=20.370, backward_time=0.052, grad_norm=188.733, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.176e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:58:23,888 (trainer:732) INFO: 89epoch:train:3939-4296batch: iter_time=0.005, forward_time=0.100, loss_ctc=31.860, loss_att=15.643, acc=0.852, loss=20.508, backward_time=0.051, grad_norm=198.218, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.175e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 07:59:56,894 (trainer:732) INFO: 89epoch:train:4297-4654batch: iter_time=0.010, forward_time=0.098, loss_ctc=31.125, loss_att=15.306, acc=0.852, loss=20.052, backward_time=0.051, grad_norm=182.791, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.175e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:01:30,105 (trainer:732) INFO: 89epoch:train:4655-5012batch: iter_time=0.009, forward_time=0.099, loss_ctc=30.899, loss_att=15.096, acc=0.855, loss=19.837, backward_time=0.051, grad_norm=194.738, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.174e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:01:33,002 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:03:01,451 (trainer:732) INFO: 89epoch:train:5013-5370batch: iter_time=0.008, forward_time=0.097, loss_ctc=31.328, loss_att=15.333, acc=0.851, loss=20.131, backward_time=0.051, grad_norm=196.816, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.173e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:04:33,160 (trainer:732) INFO: 89epoch:train:5371-5728batch: iter_time=0.008, forward_time=0.097, loss_ctc=31.503, loss_att=15.416, acc=0.854, loss=20.242, backward_time=0.051, grad_norm=193.982, clip=100.000, loss_scale=483.397, optim_step_time=0.033, optim0_lr0=2.173e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:06:04,654 (trainer:732) INFO: 89epoch:train:5729-6086batch: iter_time=0.008, forward_time=0.097, loss_ctc=31.383, loss_att=15.373, acc=0.854, loss=20.176, backward_time=0.051, grad_norm=189.228, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.172e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:07:36,284 (trainer:732) INFO: 89epoch:train:6087-6444batch: iter_time=0.008, forward_time=0.097, loss_ctc=31.317, loss_att=15.309, acc=0.854, loss=20.111, backward_time=0.051, grad_norm=186.220, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.172e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:09:10,664 (trainer:732) INFO: 89epoch:train:6445-6802batch: iter_time=0.014, forward_time=0.098, loss_ctc=29.730, loss_att=14.469, acc=0.858, loss=19.047, backward_time=0.051, grad_norm=184.277, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.171e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:10:43,406 (trainer:732) INFO: 89epoch:train:6803-7160batch: iter_time=0.010, forward_time=0.098, loss_ctc=31.487, loss_att=15.456, acc=0.854, loss=20.265, backward_time=0.051, grad_norm=192.863, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.170e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:11:51,899 (trainer:338) INFO: 89epoch results: [train] iter_time=0.005, forward_time=0.101, loss_ctc=31.346, loss_att=15.352, acc=0.854, loss=20.150, backward_time=0.051, grad_norm=189.879, clip=100.000, loss_scale=423.103, optim_step_time=0.033, optim0_lr0=2.176e-05, train_time=0.258, time=30 minutes and 48.42 seconds, total_count=637329, gpu_max_cached_mem_GB=26.869, [valid] loss_ctc=14.406, cer_ctc=0.074, loss_att=7.590, acc=0.926, cer=0.047, wer=0.656, loss=9.635, time=14.46 seconds, total_count=4717, gpu_max_cached_mem_GB=26.869, [att_plot] time=53.41 seconds, total_count=0, gpu_max_cached_mem_GB=26.869 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:11:55,706 (trainer:384) INFO: There are no improvements in this epoch [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:11:55,723 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/78epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:11:55,723 (trainer:272) INFO: 90/100epoch started. Estimated time to finish: 5 hours, 50 minutes and 40.14 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:13:28,585 (trainer:732) INFO: 90epoch:train:1-358batch: iter_time=0.003, forward_time=0.104, loss_ctc=31.035, loss_att=15.141, acc=0.855, loss=19.909, backward_time=0.051, grad_norm=187.663, clip=100.000, loss_scale=667.888, optim_step_time=0.032, optim0_lr0=2.170e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:13:32,201 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:14:58,265 (trainer:732) INFO: 90epoch:train:359-716batch: iter_time=0.001, forward_time=0.099, loss_ctc=29.907, loss_att=14.571, acc=0.856, loss=19.172, backward_time=0.051, grad_norm=182.906, clip=100.000, loss_scale=532.078, optim_step_time=0.033, optim0_lr0=2.169e-05, train_time=0.250 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:15:25,198 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:16:28,434 (trainer:732) INFO: 90epoch:train:717-1074batch: iter_time=0.002, forward_time=0.098, loss_ctc=32.223, loss_att=15.719, acc=0.854, loss=20.670, backward_time=0.051, grad_norm=186.372, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.169e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:16:35,385 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:17:58,865 (trainer:732) INFO: 90epoch:train:1075-1432batch: iter_time=0.004, forward_time=0.098, loss_ctc=30.748, loss_att=14.987, acc=0.854, loss=19.715, backward_time=0.051, grad_norm=186.366, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.168e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:19:28,684 (trainer:732) INFO: 90epoch:train:1433-1790batch: iter_time=0.004, forward_time=0.097, loss_ctc=30.378, loss_att=14.879, acc=0.856, loss=19.528, backward_time=0.051, grad_norm=182.879, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.167e-05, train_time=0.251 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:20:59,700 (trainer:732) INFO: 90epoch:train:1791-2148batch: iter_time=0.003, forward_time=0.098, loss_ctc=31.075, loss_att=15.211, acc=0.855, loss=19.970, backward_time=0.052, grad_norm=183.364, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.167e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:22:20,379 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:22:29,430 (trainer:732) INFO: 90epoch:train:2149-2506batch: iter_time=0.003, forward_time=0.097, loss_ctc=32.291, loss_att=15.910, acc=0.851, loss=20.824, backward_time=0.051, grad_norm=188.992, clip=100.000, loss_scale=649.681, optim_step_time=0.032, optim0_lr0=2.166e-05, train_time=0.250 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:24:00,699 (trainer:732) INFO: 90epoch:train:2507-2864batch: iter_time=0.005, forward_time=0.098, loss_ctc=31.149, loss_att=15.270, acc=0.855, loss=20.034, backward_time=0.051, grad_norm=189.248, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.165e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:25:31,683 (trainer:732) INFO: 90epoch:train:2865-3222batch: iter_time=0.004, forward_time=0.098, loss_ctc=32.334, loss_att=15.869, acc=0.851, loss=20.809, backward_time=0.052, grad_norm=190.252, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.165e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:25:54,260 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:27:04,278 (trainer:732) INFO: 90epoch:train:3223-3580batch: iter_time=0.006, forward_time=0.099, loss_ctc=32.583, loss_att=16.041, acc=0.850, loss=21.004, backward_time=0.052, grad_norm=196.506, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.164e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:28:35,544 (trainer:732) INFO: 90epoch:train:3581-3938batch: iter_time=0.004, forward_time=0.098, loss_ctc=32.505, loss_att=15.934, acc=0.853, loss=20.906, backward_time=0.051, grad_norm=189.468, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.164e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:30:07,994 (trainer:732) INFO: 90epoch:train:3939-4296batch: iter_time=0.008, forward_time=0.099, loss_ctc=31.062, loss_att=15.228, acc=0.850, loss=19.978, backward_time=0.051, grad_norm=181.304, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.163e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:31:02,293 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:31:41,938 (trainer:732) INFO: 90epoch:train:4297-4654batch: iter_time=0.014, forward_time=0.098, loss_ctc=30.345, loss_att=14.847, acc=0.856, loss=19.496, backward_time=0.051, grad_norm=189.560, clip=100.000, loss_scale=559.328, optim_step_time=0.032, optim0_lr0=2.162e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:33:14,236 (trainer:732) INFO: 90epoch:train:4655-5012batch: iter_time=0.009, forward_time=0.097, loss_ctc=31.742, loss_att=15.531, acc=0.854, loss=20.395, backward_time=0.051, grad_norm=190.056, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.162e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:34:47,049 (trainer:732) INFO: 90epoch:train:5013-5370batch: iter_time=0.010, forward_time=0.098, loss_ctc=31.326, loss_att=15.390, acc=0.856, loss=20.171, backward_time=0.051, grad_norm=190.423, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.161e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:36:19,775 (trainer:732) INFO: 90epoch:train:5371-5728batch: iter_time=0.011, forward_time=0.097, loss_ctc=31.373, loss_att=15.322, acc=0.853, loss=20.137, backward_time=0.051, grad_norm=197.469, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.161e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:37:53,007 (trainer:732) INFO: 90epoch:train:5729-6086batch: iter_time=0.012, forward_time=0.097, loss_ctc=31.075, loss_att=15.239, acc=0.856, loss=19.990, backward_time=0.051, grad_norm=189.822, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.160e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:39:25,895 (trainer:732) INFO: 90epoch:train:6087-6444batch: iter_time=0.013, forward_time=0.096, loss_ctc=30.544, loss_att=14.918, acc=0.855, loss=19.605, backward_time=0.051, grad_norm=190.615, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.159e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:39:51,054 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:41:00,608 (trainer:732) INFO: 90epoch:train:6445-6802batch: iter_time=0.014, forward_time=0.098, loss_ctc=30.583, loss_att=15.000, acc=0.854, loss=19.675, backward_time=0.051, grad_norm=186.460, clip=100.000, loss_scale=557.894, optim_step_time=0.033, optim0_lr0=2.159e-05, train_time=0.264 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:42:33,888 (trainer:732) INFO: 90epoch:train:6803-7160batch: iter_time=0.012, forward_time=0.097, loss_ctc=30.941, loss_att=15.160, acc=0.854, loss=19.895, backward_time=0.051, grad_norm=182.396, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.158e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:43:42,580 (trainer:338) INFO: 90epoch results: [train] iter_time=0.007, forward_time=0.098, loss_ctc=31.246, loss_att=15.300, acc=0.854, loss=20.084, backward_time=0.051, grad_norm=188.104, clip=100.000, loss_scale=532.317, optim_step_time=0.033, optim0_lr0=2.164e-05, train_time=0.256, time=30 minutes and 38.84 seconds, total_count=644490, gpu_max_cached_mem_GB=26.869, [valid] loss_ctc=14.299, cer_ctc=0.073, loss_att=7.571, acc=0.926, cer=0.046, wer=0.658, loss=9.589, time=14.61 seconds, total_count=4770, gpu_max_cached_mem_GB=26.869, [att_plot] time=53.4 seconds, total_count=0, gpu_max_cached_mem_GB=26.869 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:43:46,302 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:43:46,318 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/80epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:43:46,319 (trainer:272) INFO: 91/100epoch started. Estimated time to finish: 5 hours, 18 minutes and 46.2 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:43:49,163 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:45:17,462 (trainer:732) INFO: 91epoch:train:1-358batch: iter_time=0.003, forward_time=0.100, loss_ctc=30.482, loss_att=14.854, acc=0.856, loss=19.542, backward_time=0.052, grad_norm=185.173, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.158e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:46:48,419 (trainer:732) INFO: 91epoch:train:359-716batch: iter_time=0.002, forward_time=0.099, loss_ctc=31.530, loss_att=15.363, acc=0.855, loss=20.213, backward_time=0.051, grad_norm=189.386, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.157e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:48:19,398 (trainer:732) INFO: 91epoch:train:717-1074batch: iter_time=0.002, forward_time=0.099, loss_ctc=31.473, loss_att=15.434, acc=0.855, loss=20.246, backward_time=0.051, grad_norm=188.282, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.156e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:49:49,767 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:49:50,028 (trainer:732) INFO: 91epoch:train:1075-1432batch: iter_time=0.002, forward_time=0.099, loss_ctc=32.529, loss_att=15.879, acc=0.852, loss=20.874, backward_time=0.051, grad_norm=188.125, clip=100.000, loss_scale=589.445, optim_step_time=0.032, optim0_lr0=2.156e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:51:20,248 (trainer:732) INFO: 91epoch:train:1433-1790batch: iter_time=0.003, forward_time=0.097, loss_ctc=31.727, loss_att=15.540, acc=0.854, loss=20.396, backward_time=0.051, grad_norm=187.443, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.155e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:52:47,178 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:52:50,489 (trainer:732) INFO: 91epoch:train:1791-2148batch: iter_time=0.002, forward_time=0.098, loss_ctc=31.414, loss_att=15.410, acc=0.854, loss=20.211, backward_time=0.051, grad_norm=186.786, clip=100.000, loss_scale=503.395, optim_step_time=0.032, optim0_lr0=2.155e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:54:20,949 (trainer:732) INFO: 91epoch:train:2149-2506batch: iter_time=0.002, forward_time=0.098, loss_ctc=31.600, loss_att=15.451, acc=0.857, loss=20.296, backward_time=0.051, grad_norm=188.746, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.154e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:55:51,074 (trainer:732) INFO: 91epoch:train:2507-2864batch: iter_time=0.003, forward_time=0.098, loss_ctc=31.055, loss_att=15.219, acc=0.854, loss=19.970, backward_time=0.051, grad_norm=181.907, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.153e-05, train_time=0.251 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:57:21,408 (trainer:732) INFO: 91epoch:train:2865-3222batch: iter_time=0.005, forward_time=0.097, loss_ctc=30.006, loss_att=14.725, acc=0.857, loss=19.309, backward_time=0.051, grad_norm=183.184, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.153e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 08:58:54,045 (trainer:732) INFO: 91epoch:train:3223-3580batch: iter_time=0.007, forward_time=0.099, loss_ctc=30.548, loss_att=14.984, acc=0.856, loss=19.653, backward_time=0.051, grad_norm=189.709, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.152e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:00:25,452 (trainer:732) INFO: 91epoch:train:3581-3938batch: iter_time=0.007, forward_time=0.098, loss_ctc=31.245, loss_att=15.303, acc=0.853, loss=20.085, backward_time=0.051, grad_norm=192.879, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.152e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:01:55,743 (trainer:732) INFO: 91epoch:train:3939-4296batch: iter_time=0.005, forward_time=0.097, loss_ctc=31.760, loss_att=15.502, acc=0.854, loss=20.379, backward_time=0.052, grad_norm=192.938, clip=100.000, loss_scale=370.413, optim_step_time=0.032, optim0_lr0=2.151e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:03:27,637 (trainer:732) INFO: 91epoch:train:4297-4654batch: iter_time=0.008, forward_time=0.098, loss_ctc=30.719, loss_att=15.021, acc=0.854, loss=19.731, backward_time=0.051, grad_norm=187.299, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.151e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:04:49,028 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:05:00,138 (trainer:732) INFO: 91epoch:train:4655-5012batch: iter_time=0.009, forward_time=0.098, loss_ctc=29.088, loss_att=14.134, acc=0.859, loss=18.620, backward_time=0.051, grad_norm=184.551, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.150e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:06:31,947 (trainer:732) INFO: 91epoch:train:5013-5370batch: iter_time=0.007, forward_time=0.099, loss_ctc=31.826, loss_att=15.658, acc=0.849, loss=20.508, backward_time=0.051, grad_norm=195.372, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.149e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:08:05,117 (trainer:732) INFO: 91epoch:train:5371-5728batch: iter_time=0.008, forward_time=0.099, loss_ctc=30.958, loss_att=15.193, acc=0.854, loss=19.923, backward_time=0.051, grad_norm=191.159, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.149e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:09:38,021 (trainer:732) INFO: 91epoch:train:5729-6086batch: iter_time=0.009, forward_time=0.098, loss_ctc=31.676, loss_att=15.536, acc=0.851, loss=20.378, backward_time=0.051, grad_norm=184.799, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.148e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:09:52,872 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:11:10,665 (trainer:732) INFO: 91epoch:train:6087-6444batch: iter_time=0.012, forward_time=0.097, loss_ctc=30.073, loss_att=14.748, acc=0.856, loss=19.345, backward_time=0.051, grad_norm=187.040, clip=100.000, loss_scale=522.039, optim_step_time=0.033, optim0_lr0=2.148e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:12:43,105 (trainer:732) INFO: 91epoch:train:6445-6802batch: iter_time=0.009, forward_time=0.098, loss_ctc=30.601, loss_att=14.913, acc=0.858, loss=19.620, backward_time=0.051, grad_norm=190.367, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.147e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:13:00,137 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:13:29,585 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:14:16,460 (trainer:732) INFO: 91epoch:train:6803-7160batch: iter_time=0.006, forward_time=0.101, loss_ctc=32.297, loss_att=15.805, acc=0.855, loss=20.753, backward_time=0.052, grad_norm=200.770, clip=100.000, loss_scale=302.611, optim_step_time=0.034, optim0_lr0=2.146e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:15:25,040 (trainer:338) INFO: 91epoch results: [train] iter_time=0.006, forward_time=0.098, loss_ctc=31.117, loss_att=15.227, acc=0.855, loss=19.994, backward_time=0.051, grad_norm=188.784, clip=100.000, loss_scale=434.345, optim_step_time=0.033, optim0_lr0=2.152e-05, train_time=0.255, time=30 minutes and 30.76 seconds, total_count=651651, gpu_max_cached_mem_GB=26.869, [valid] loss_ctc=14.243, cer_ctc=0.073, loss_att=7.483, acc=0.928, cer=0.046, wer=0.650, loss=9.511, time=14.69 seconds, total_count=4823, gpu_max_cached_mem_GB=26.869, [att_plot] time=53.27 seconds, total_count=0, gpu_max_cached_mem_GB=26.869 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:15:28,811 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:15:28,828 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/81epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:15:28,828 (trainer:272) INFO: 92/100epoch started. Estimated time to finish: 4 hours, 46 minutes and 48.8 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:16:59,459 (trainer:732) INFO: 92epoch:train:1-358batch: iter_time=0.004, forward_time=0.098, loss_ctc=30.432, loss_att=14.884, acc=0.855, loss=19.548, backward_time=0.051, grad_norm=188.836, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.146e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:18:29,832 (trainer:732) INFO: 92epoch:train:359-716batch: iter_time=0.002, forward_time=0.098, loss_ctc=31.366, loss_att=15.328, acc=0.857, loss=20.139, backward_time=0.051, grad_norm=194.382, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.145e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:20:00,589 (trainer:732) INFO: 92epoch:train:717-1074batch: iter_time=0.002, forward_time=0.099, loss_ctc=31.686, loss_att=15.524, acc=0.854, loss=20.372, backward_time=0.051, grad_norm=188.558, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.145e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:21:30,317 (trainer:732) INFO: 92epoch:train:1075-1432batch: iter_time=0.002, forward_time=0.098, loss_ctc=31.016, loss_att=15.116, acc=0.856, loss=19.886, backward_time=0.051, grad_norm=185.414, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.144e-05, train_time=0.250 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:23:00,331 (trainer:732) INFO: 92epoch:train:1433-1790batch: iter_time=0.002, forward_time=0.098, loss_ctc=30.953, loss_att=15.113, acc=0.858, loss=19.865, backward_time=0.052, grad_norm=182.699, clip=100.000, loss_scale=315.352, optim_step_time=0.033, optim0_lr0=2.143e-05, train_time=0.251 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:24:29,115 (trainer:732) INFO: 92epoch:train:1791-2148batch: iter_time=0.003, forward_time=0.096, loss_ctc=30.042, loss_att=14.670, acc=0.856, loss=19.282, backward_time=0.051, grad_norm=183.075, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.143e-05, train_time=0.248 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:26:00,283 (trainer:732) INFO: 92epoch:train:2149-2506batch: iter_time=0.005, forward_time=0.098, loss_ctc=31.009, loss_att=15.134, acc=0.856, loss=19.896, backward_time=0.052, grad_norm=185.541, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.142e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:26:47,436 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:27:30,847 (trainer:732) INFO: 92epoch:train:2507-2864batch: iter_time=0.003, forward_time=0.098, loss_ctc=32.697, loss_att=16.036, acc=0.853, loss=21.034, backward_time=0.051, grad_norm=194.797, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.142e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:29:01,965 (trainer:732) INFO: 92epoch:train:2865-3222batch: iter_time=0.006, forward_time=0.097, loss_ctc=31.132, loss_att=15.251, acc=0.854, loss=20.015, backward_time=0.051, grad_norm=182.447, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.141e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:30:33,569 (trainer:732) INFO: 92epoch:train:3223-3580batch: iter_time=0.007, forward_time=0.098, loss_ctc=29.621, loss_att=14.527, acc=0.857, loss=19.055, backward_time=0.052, grad_norm=184.657, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.140e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:31:44,933 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:32:03,940 (trainer:732) INFO: 92epoch:train:3581-3938batch: iter_time=0.004, forward_time=0.097, loss_ctc=31.117, loss_att=15.206, acc=0.855, loss=19.979, backward_time=0.051, grad_norm=188.094, clip=100.000, loss_scale=734.297, optim_step_time=0.032, optim0_lr0=2.140e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:32:16,091 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:32:55,723 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:33:36,007 (trainer:732) INFO: 92epoch:train:3939-4296batch: iter_time=0.009, forward_time=0.098, loss_ctc=29.581, loss_att=14.408, acc=0.859, loss=18.960, backward_time=0.051, grad_norm=183.611, clip=100.000, loss_scale=399.417, optim_step_time=0.032, optim0_lr0=2.139e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:35:07,271 (trainer:732) INFO: 92epoch:train:4297-4654batch: iter_time=0.006, forward_time=0.097, loss_ctc=31.764, loss_att=15.558, acc=0.853, loss=20.420, backward_time=0.052, grad_norm=186.144, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.139e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:36:38,493 (trainer:732) INFO: 92epoch:train:4655-5012batch: iter_time=0.003, forward_time=0.099, loss_ctc=32.361, loss_att=15.825, acc=0.851, loss=20.786, backward_time=0.052, grad_norm=195.208, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.138e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:38:08,439 (trainer:732) INFO: 92epoch:train:5013-5370batch: iter_time=0.004, forward_time=0.097, loss_ctc=31.156, loss_att=15.278, acc=0.855, loss=20.041, backward_time=0.051, grad_norm=191.521, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.138e-05, train_time=0.251 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:38:09,434 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:39:39,932 (trainer:732) INFO: 92epoch:train:5371-5728batch: iter_time=0.009, forward_time=0.097, loss_ctc=29.542, loss_att=14.479, acc=0.857, loss=18.998, backward_time=0.051, grad_norm=185.235, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.137e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:41:11,606 (trainer:732) INFO: 92epoch:train:5729-6086batch: iter_time=0.007, forward_time=0.098, loss_ctc=30.774, loss_att=15.075, acc=0.857, loss=19.785, backward_time=0.051, grad_norm=189.940, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.136e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:42:45,058 (trainer:732) INFO: 92epoch:train:6087-6444batch: iter_time=0.009, forward_time=0.099, loss_ctc=31.624, loss_att=15.543, acc=0.853, loss=20.368, backward_time=0.051, grad_norm=190.229, clip=100.000, loss_scale=474.101, optim_step_time=0.033, optim0_lr0=2.136e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:44:17,785 (trainer:732) INFO: 92epoch:train:6445-6802batch: iter_time=0.009, forward_time=0.098, loss_ctc=30.606, loss_att=15.012, acc=0.857, loss=19.690, backward_time=0.051, grad_norm=189.501, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.135e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:45:50,373 (trainer:732) INFO: 92epoch:train:6803-7160batch: iter_time=0.008, forward_time=0.098, loss_ctc=31.486, loss_att=15.409, acc=0.854, loss=20.232, backward_time=0.052, grad_norm=190.392, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.135e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:46:59,165 (trainer:338) INFO: 92epoch results: [train] iter_time=0.005, forward_time=0.098, loss_ctc=30.986, loss_att=15.163, acc=0.855, loss=19.910, backward_time=0.051, grad_norm=188.006, clip=100.000, loss_scale=390.526, optim_step_time=0.033, optim0_lr0=2.140e-05, train_time=0.254, time=30 minutes and 22.25 seconds, total_count=658812, gpu_max_cached_mem_GB=26.869, [valid] loss_ctc=14.095, cer_ctc=0.072, loss_att=7.455, acc=0.928, cer=0.046, wer=0.652, loss=9.447, time=14.66 seconds, total_count=4876, gpu_max_cached_mem_GB=26.869, [att_plot] time=53.43 seconds, total_count=0, gpu_max_cached_mem_GB=26.869 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:47:02,808 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:47:02,842 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/82epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:47:02,842 (trainer:272) INFO: 93/100epoch started. Estimated time to finish: 4 hours, 14 minutes and 49.48 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:47:36,452 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:48:32,860 (trainer:732) INFO: 93epoch:train:1-358batch: iter_time=0.006, forward_time=0.096, loss_ctc=29.597, loss_att=14.494, acc=0.856, loss=19.025, backward_time=0.051, grad_norm=180.313, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.134e-05, train_time=0.251 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:50:03,534 (trainer:732) INFO: 93epoch:train:359-716batch: iter_time=0.001, forward_time=0.099, loss_ctc=32.611, loss_att=15.904, acc=0.854, loss=20.917, backward_time=0.052, grad_norm=203.223, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.134e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:51:34,947 (trainer:732) INFO: 93epoch:train:717-1074batch: iter_time=0.005, forward_time=0.098, loss_ctc=30.341, loss_att=14.857, acc=0.857, loss=19.502, backward_time=0.052, grad_norm=187.544, clip=100.000, loss_scale=649.296, optim_step_time=0.033, optim0_lr0=2.133e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:51:50,274 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:53:05,583 (trainer:732) INFO: 93epoch:train:1075-1432batch: iter_time=0.003, forward_time=0.098, loss_ctc=31.062, loss_att=15.205, acc=0.855, loss=19.962, backward_time=0.051, grad_norm=187.406, clip=100.000, loss_scale=598.050, optim_step_time=0.033, optim0_lr0=2.132e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:53:59,521 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:54:35,555 (trainer:732) INFO: 93epoch:train:1433-1790batch: iter_time=0.003, forward_time=0.097, loss_ctc=30.957, loss_att=15.047, acc=0.855, loss=19.820, backward_time=0.052, grad_norm=186.689, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.132e-05, train_time=0.251 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:56:05,842 (trainer:732) INFO: 93epoch:train:1791-2148batch: iter_time=0.003, forward_time=0.097, loss_ctc=30.948, loss_att=15.156, acc=0.858, loss=19.893, backward_time=0.052, grad_norm=185.337, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.131e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:57:36,698 (trainer:732) INFO: 93epoch:train:2149-2506batch: iter_time=0.002, forward_time=0.099, loss_ctc=32.785, loss_att=16.019, acc=0.853, loss=21.049, backward_time=0.051, grad_norm=197.074, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.131e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:58:10,461 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 09:59:07,802 (trainer:732) INFO: 93epoch:train:2507-2864batch: iter_time=0.006, forward_time=0.097, loss_ctc=30.707, loss_att=14.994, acc=0.855, loss=19.708, backward_time=0.052, grad_norm=189.765, clip=100.000, loss_scale=348.504, optim_step_time=0.033, optim0_lr0=2.130e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:00:40,388 (trainer:732) INFO: 93epoch:train:2865-3222batch: iter_time=0.004, forward_time=0.101, loss_ctc=31.789, loss_att=15.613, acc=0.855, loss=20.466, backward_time=0.052, grad_norm=194.508, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.129e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:02:11,659 (trainer:732) INFO: 93epoch:train:3223-3580batch: iter_time=0.004, forward_time=0.099, loss_ctc=33.605, loss_att=16.527, acc=0.851, loss=21.650, backward_time=0.051, grad_norm=197.987, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.129e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:03:41,935 (trainer:732) INFO: 93epoch:train:3581-3938batch: iter_time=0.005, forward_time=0.097, loss_ctc=31.367, loss_att=15.307, acc=0.856, loss=20.125, backward_time=0.051, grad_norm=190.458, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.128e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:05:14,980 (trainer:732) INFO: 93epoch:train:3939-4296batch: iter_time=0.009, forward_time=0.099, loss_ctc=30.668, loss_att=15.011, acc=0.854, loss=19.708, backward_time=0.052, grad_norm=195.869, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.128e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:06:48,233 (trainer:732) INFO: 93epoch:train:4297-4654batch: iter_time=0.010, forward_time=0.099, loss_ctc=31.384, loss_att=15.335, acc=0.855, loss=20.150, backward_time=0.052, grad_norm=189.183, clip=100.000, loss_scale=268.872, optim_step_time=0.032, optim0_lr0=2.127e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:08:27,688 (trainer:732) INFO: 93epoch:train:4655-5012batch: iter_time=0.028, forward_time=0.098, loss_ctc=29.625, loss_att=14.455, acc=0.857, loss=19.006, backward_time=0.051, grad_norm=182.357, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.127e-05, train_time=0.278 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:09:59,312 (trainer:732) INFO: 93epoch:train:5013-5370batch: iter_time=0.009, forward_time=0.097, loss_ctc=30.224, loss_att=14.720, acc=0.859, loss=19.371, backward_time=0.051, grad_norm=194.711, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.126e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:11:34,044 (trainer:732) INFO: 93epoch:train:5371-5728batch: iter_time=0.012, forward_time=0.100, loss_ctc=30.018, loss_att=14.637, acc=0.858, loss=19.251, backward_time=0.052, grad_norm=181.478, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.125e-05, train_time=0.264 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:13:08,796 (trainer:732) INFO: 93epoch:train:5729-6086batch: iter_time=0.017, forward_time=0.097, loss_ctc=29.661, loss_att=14.477, acc=0.857, loss=19.032, backward_time=0.052, grad_norm=187.694, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.125e-05, train_time=0.264 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:14:36,350 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:14:47,604 (trainer:732) INFO: 93epoch:train:6087-6444batch: iter_time=0.025, forward_time=0.099, loss_ctc=29.569, loss_att=14.490, acc=0.860, loss=19.014, backward_time=0.052, grad_norm=179.583, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.124e-05, train_time=0.276 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:15:40,186 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:16:24,730 (trainer:732) INFO: 93epoch:train:6445-6802batch: iter_time=0.021, forward_time=0.098, loss_ctc=31.125, loss_att=15.215, acc=0.857, loss=19.988, backward_time=0.051, grad_norm=189.718, clip=100.000, loss_scale=514.868, optim_step_time=0.033, optim0_lr0=2.124e-05, train_time=0.271 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:18:04,166 (trainer:732) INFO: 93epoch:train:6803-7160batch: iter_time=0.031, forward_time=0.097, loss_ctc=30.545, loss_att=14.902, acc=0.855, loss=19.595, backward_time=0.052, grad_norm=192.963, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.123e-05, train_time=0.278 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:19:15,173 (trainer:338) INFO: 93epoch results: [train] iter_time=0.010, forward_time=0.098, loss_ctc=30.907, loss_att=15.107, acc=0.856, loss=19.847, backward_time=0.052, grad_norm=189.697, clip=100.000, loss_scale=451.773, optim_step_time=0.033, optim0_lr0=2.129e-05, train_time=0.260, time=31 minutes and 1.99 seconds, total_count=665973, gpu_max_cached_mem_GB=26.869, [valid] loss_ctc=14.116, cer_ctc=0.072, loss_att=7.430, acc=0.928, cer=0.045, wer=0.643, loss=9.436, time=16.75 seconds, total_count=4929, gpu_max_cached_mem_GB=26.869, [att_plot] time=53.59 seconds, total_count=0, gpu_max_cached_mem_GB=26.869 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:19:18,709 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:19:18,733 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/83epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:19:18,734 (trainer:272) INFO: 94/100epoch started. Estimated time to finish: 3 hours, 43 minutes and 6.53 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:19:36,367 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:20:52,830 (trainer:732) INFO: 94epoch:train:1-358batch: iter_time=0.010, forward_time=0.102, loss_ctc=31.183, loss_att=15.190, acc=0.856, loss=19.988, backward_time=0.052, grad_norm=190.819, clip=100.000, loss_scale=299.025, optim_step_time=0.033, optim0_lr0=2.123e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:22:26,095 (trainer:732) INFO: 94epoch:train:359-716batch: iter_time=0.004, forward_time=0.104, loss_ctc=30.561, loss_att=14.990, acc=0.858, loss=19.661, backward_time=0.052, grad_norm=189.617, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.122e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:23:59,252 (trainer:732) INFO: 94epoch:train:717-1074batch: iter_time=0.006, forward_time=0.103, loss_ctc=31.244, loss_att=15.258, acc=0.856, loss=20.054, backward_time=0.052, grad_norm=190.645, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.121e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:25:33,149 (trainer:732) INFO: 94epoch:train:1075-1432batch: iter_time=0.004, forward_time=0.104, loss_ctc=30.351, loss_att=14.829, acc=0.858, loss=19.486, backward_time=0.052, grad_norm=182.807, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.121e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:27:05,639 (trainer:732) INFO: 94epoch:train:1433-1790batch: iter_time=0.007, forward_time=0.099, loss_ctc=31.249, loss_att=15.278, acc=0.856, loss=20.069, backward_time=0.051, grad_norm=193.783, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.120e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:28:38,822 (trainer:732) INFO: 94epoch:train:1791-2148batch: iter_time=0.011, forward_time=0.098, loss_ctc=30.924, loss_att=15.088, acc=0.856, loss=19.839, backward_time=0.052, grad_norm=187.572, clip=100.000, loss_scale=318.212, optim_step_time=0.033, optim0_lr0=2.120e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:30:11,922 (trainer:732) INFO: 94epoch:train:2149-2506batch: iter_time=0.010, forward_time=0.098, loss_ctc=30.866, loss_att=15.092, acc=0.858, loss=19.824, backward_time=0.052, grad_norm=186.302, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.119e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:31:05,400 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:31:44,571 (trainer:732) INFO: 94epoch:train:2507-2864batch: iter_time=0.010, forward_time=0.097, loss_ctc=30.077, loss_att=14.673, acc=0.858, loss=19.294, backward_time=0.052, grad_norm=178.854, clip=100.000, loss_scale=403.003, optim_step_time=0.032, optim0_lr0=2.119e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:33:21,020 (trainer:732) INFO: 94epoch:train:2865-3222batch: iter_time=0.018, forward_time=0.099, loss_ctc=30.230, loss_att=14.769, acc=0.858, loss=19.407, backward_time=0.052, grad_norm=190.500, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.118e-05, train_time=0.269 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:33:53,003 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:34:57,302 (trainer:732) INFO: 94epoch:train:3223-3580batch: iter_time=0.019, forward_time=0.098, loss_ctc=31.195, loss_att=15.254, acc=0.856, loss=20.037, backward_time=0.051, grad_norm=188.621, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.117e-05, train_time=0.269 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:35:53,958 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:36:31,711 (trainer:732) INFO: 94epoch:train:3581-3938batch: iter_time=0.011, forward_time=0.099, loss_ctc=31.736, loss_att=15.544, acc=0.855, loss=20.401, backward_time=0.052, grad_norm=196.571, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.117e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:38:06,061 (trainer:732) INFO: 94epoch:train:3939-4296batch: iter_time=0.014, forward_time=0.098, loss_ctc=31.259, loss_att=15.302, acc=0.856, loss=20.089, backward_time=0.051, grad_norm=194.222, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.116e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:39:40,601 (trainer:732) INFO: 94epoch:train:4297-4654batch: iter_time=0.014, forward_time=0.098, loss_ctc=31.729, loss_att=15.500, acc=0.853, loss=20.368, backward_time=0.052, grad_norm=188.119, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.116e-05, train_time=0.264 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:41:15,193 (trainer:732) INFO: 94epoch:train:4655-5012batch: iter_time=0.012, forward_time=0.099, loss_ctc=32.854, loss_att=16.074, acc=0.856, loss=21.108, backward_time=0.052, grad_norm=197.173, clip=100.000, loss_scale=470.525, optim_step_time=0.032, optim0_lr0=2.115e-05, train_time=0.264 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:41:40,698 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:42:03,351 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:42:50,961 (trainer:732) INFO: 94epoch:train:5013-5370batch: iter_time=0.018, forward_time=0.098, loss_ctc=29.765, loss_att=14.501, acc=0.858, loss=19.080, backward_time=0.051, grad_norm=187.311, clip=100.000, loss_scale=324.123, optim_step_time=0.033, optim0_lr0=2.115e-05, train_time=0.267 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:44:25,553 (trainer:732) INFO: 94epoch:train:5371-5728batch: iter_time=0.016, forward_time=0.097, loss_ctc=30.338, loss_att=14.786, acc=0.858, loss=19.452, backward_time=0.051, grad_norm=195.501, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.114e-05, train_time=0.264 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:46:01,374 (trainer:732) INFO: 94epoch:train:5729-6086batch: iter_time=0.021, forward_time=0.097, loss_ctc=29.819, loss_att=14.548, acc=0.858, loss=19.130, backward_time=0.051, grad_norm=191.764, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.114e-05, train_time=0.267 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:47:37,396 (trainer:732) INFO: 94epoch:train:6087-6444batch: iter_time=0.022, forward_time=0.096, loss_ctc=29.964, loss_att=14.652, acc=0.857, loss=19.245, backward_time=0.051, grad_norm=186.602, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.113e-05, train_time=0.268 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:49:13,689 (trainer:732) INFO: 94epoch:train:6445-6802batch: iter_time=0.017, forward_time=0.099, loss_ctc=32.604, loss_att=15.936, acc=0.853, loss=20.937, backward_time=0.051, grad_norm=192.399, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.112e-05, train_time=0.269 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:50:53,096 (trainer:732) INFO: 94epoch:train:6803-7160batch: iter_time=0.031, forward_time=0.097, loss_ctc=28.908, loss_att=14.123, acc=0.859, loss=18.559, backward_time=0.052, grad_norm=183.950, clip=100.000, loss_scale=293.184, optim_step_time=0.032, optim0_lr0=2.112e-05, train_time=0.277 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:52:01,744 (trainer:338) INFO: 94epoch results: [train] iter_time=0.014, forward_time=0.099, loss_ctc=30.827, loss_att=15.061, acc=0.857, loss=19.791, backward_time=0.052, grad_norm=189.662, clip=100.000, loss_scale=297.415, optim_step_time=0.033, optim0_lr0=2.117e-05, train_time=0.264, time=31 minutes and 35.08 seconds, total_count=673134, gpu_max_cached_mem_GB=26.869, [valid] loss_ctc=14.067, cer_ctc=0.072, loss_att=7.415, acc=0.928, cer=0.045, wer=0.651, loss=9.411, time=14.53 seconds, total_count=4982, gpu_max_cached_mem_GB=26.869, [att_plot] time=53.4 seconds, total_count=0, gpu_max_cached_mem_GB=26.869 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:52:05,180 (trainer:384) INFO: There are no improvements in this epoch [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:52:05,188 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/87epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:52:05,188 (trainer:272) INFO: 95/100epoch started. Estimated time to finish: 3 hours, 11 minutes and 28.92 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:53:33,155 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:53:38,479 (trainer:732) INFO: 95epoch:train:1-358batch: iter_time=0.004, forward_time=0.104, loss_ctc=30.074, loss_att=14.708, acc=0.859, loss=19.318, backward_time=0.052, grad_norm=185.433, clip=100.000, loss_scale=496.224, optim_step_time=0.032, optim0_lr0=2.111e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:55:11,478 (trainer:732) INFO: 95epoch:train:359-716batch: iter_time=8.678e-04, forward_time=0.105, loss_ctc=31.422, loss_att=15.306, acc=0.857, loss=20.141, backward_time=0.051, grad_norm=194.984, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.111e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:56:42,552 (trainer:732) INFO: 95epoch:train:717-1074batch: iter_time=0.001, forward_time=0.102, loss_ctc=31.495, loss_att=15.407, acc=0.855, loss=20.233, backward_time=0.051, grad_norm=190.985, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.110e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:58:11,365 (trainer:732) INFO: 95epoch:train:1075-1432batch: iter_time=0.001, forward_time=0.097, loss_ctc=30.243, loss_att=14.763, acc=0.858, loss=19.407, backward_time=0.052, grad_norm=188.687, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.110e-05, train_time=0.248 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 10:59:42,635 (trainer:732) INFO: 95epoch:train:1433-1790batch: iter_time=0.004, forward_time=0.099, loss_ctc=31.139, loss_att=15.240, acc=0.856, loss=20.009, backward_time=0.051, grad_norm=193.468, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.109e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:01:14,008 (trainer:732) INFO: 95epoch:train:1791-2148batch: iter_time=0.005, forward_time=0.098, loss_ctc=30.144, loss_att=14.676, acc=0.858, loss=19.316, backward_time=0.051, grad_norm=190.212, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.108e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:02:45,369 (trainer:732) INFO: 95epoch:train:2149-2506batch: iter_time=0.007, forward_time=0.098, loss_ctc=29.443, loss_att=14.387, acc=0.860, loss=18.904, backward_time=0.051, grad_norm=190.580, clip=100.000, loss_scale=377.564, optim_step_time=0.032, optim0_lr0=2.108e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:04:17,360 (trainer:732) INFO: 95epoch:train:2507-2864batch: iter_time=0.007, forward_time=0.098, loss_ctc=30.238, loss_att=14.779, acc=0.857, loss=19.417, backward_time=0.052, grad_norm=190.166, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.107e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:05:48,407 (trainer:732) INFO: 95epoch:train:2865-3222batch: iter_time=0.008, forward_time=0.097, loss_ctc=30.752, loss_att=15.082, acc=0.857, loss=19.783, backward_time=0.051, grad_norm=191.420, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.107e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:07:20,353 (trainer:732) INFO: 95epoch:train:3223-3580batch: iter_time=0.007, forward_time=0.098, loss_ctc=30.392, loss_att=14.815, acc=0.859, loss=19.488, backward_time=0.053, grad_norm=194.767, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.106e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:08:52,506 (trainer:732) INFO: 95epoch:train:3581-3938batch: iter_time=0.008, forward_time=0.098, loss_ctc=30.116, loss_att=14.692, acc=0.857, loss=19.319, backward_time=0.051, grad_norm=190.638, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.106e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:10:24,922 (trainer:732) INFO: 95epoch:train:3939-4296batch: iter_time=0.011, forward_time=0.097, loss_ctc=29.377, loss_att=14.322, acc=0.862, loss=18.839, backward_time=0.052, grad_norm=188.331, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.105e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:11:11,670 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:11:55,904 (trainer:732) INFO: 95epoch:train:4297-4654batch: iter_time=0.006, forward_time=0.097, loss_ctc=31.042, loss_att=15.190, acc=0.856, loss=19.946, backward_time=0.051, grad_norm=182.519, clip=100.000, loss_scale=717.087, optim_step_time=0.033, optim0_lr0=2.105e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:13:29,140 (trainer:732) INFO: 95epoch:train:4655-5012batch: iter_time=0.008, forward_time=0.099, loss_ctc=31.759, loss_att=15.547, acc=0.856, loss=20.410, backward_time=0.051, grad_norm=197.373, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.104e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:15:01,348 (trainer:732) INFO: 95epoch:train:5013-5370batch: iter_time=0.007, forward_time=0.098, loss_ctc=30.683, loss_att=14.993, acc=0.859, loss=19.700, backward_time=0.052, grad_norm=197.718, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.103e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:15:52,310 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:16:33,880 (trainer:732) INFO: 95epoch:train:5371-5728batch: iter_time=0.007, forward_time=0.099, loss_ctc=30.990, loss_att=15.160, acc=0.857, loss=19.909, backward_time=0.052, grad_norm=189.771, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.103e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:16:40,705 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:18:06,788 (trainer:732) INFO: 95epoch:train:5729-6086batch: iter_time=0.010, forward_time=0.098, loss_ctc=31.473, loss_att=15.361, acc=0.855, loss=20.194, backward_time=0.052, grad_norm=187.922, clip=100.000, loss_scale=275.361, optim_step_time=0.033, optim0_lr0=2.102e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:18:13,593 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:19:39,356 (trainer:732) INFO: 95epoch:train:6087-6444batch: iter_time=0.010, forward_time=0.097, loss_ctc=30.822, loss_att=15.003, acc=0.857, loss=19.749, backward_time=0.051, grad_norm=181.942, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.102e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:20:09,106 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:21:12,598 (trainer:732) INFO: 95epoch:train:6445-6802batch: iter_time=0.009, forward_time=0.098, loss_ctc=31.365, loss_att=15.408, acc=0.855, loss=20.195, backward_time=0.051, grad_norm=188.276, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.101e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:22:46,484 (trainer:732) INFO: 95epoch:train:6803-7160batch: iter_time=0.012, forward_time=0.099, loss_ctc=30.558, loss_att=14.939, acc=0.856, loss=19.625, backward_time=0.052, grad_norm=183.434, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.101e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:23:55,111 (trainer:338) INFO: 95epoch results: [train] iter_time=0.007, forward_time=0.099, loss_ctc=30.668, loss_att=14.984, acc=0.857, loss=19.689, backward_time=0.052, grad_norm=189.935, clip=100.000, loss_scale=400.452, optim_step_time=0.033, optim0_lr0=2.106e-05, train_time=0.257, time=30 minutes and 42.03 seconds, total_count=680295, gpu_max_cached_mem_GB=26.869, [valid] loss_ctc=14.059, cer_ctc=0.071, loss_att=7.386, acc=0.929, cer=0.045, wer=0.647, loss=9.388, time=14.5 seconds, total_count=5035, gpu_max_cached_mem_GB=26.869, [att_plot] time=53.38 seconds, total_count=0, gpu_max_cached_mem_GB=26.869 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:23:58,738 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:23:58,761 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/85epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:23:58,761 (trainer:272) INFO: 96/100epoch started. Estimated time to finish: 2 hours, 39 minutes and 33.83 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:25:31,464 (trainer:732) INFO: 96epoch:train:1-358batch: iter_time=0.008, forward_time=0.101, loss_ctc=30.495, loss_att=14.788, acc=0.858, loss=19.500, backward_time=0.051, grad_norm=178.642, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.100e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:27:01,055 (trainer:732) INFO: 96epoch:train:359-716batch: iter_time=0.002, forward_time=0.098, loss_ctc=30.297, loss_att=14.804, acc=0.859, loss=19.452, backward_time=0.051, grad_norm=190.614, clip=100.000, loss_scale=342.525, optim_step_time=0.032, optim0_lr0=2.100e-05, train_time=0.250 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:28:31,089 (trainer:732) INFO: 96epoch:train:717-1074batch: iter_time=0.001, forward_time=0.098, loss_ctc=31.967, loss_att=15.586, acc=0.855, loss=20.500, backward_time=0.051, grad_norm=194.754, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.099e-05, train_time=0.251 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:29:42,396 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:30:01,248 (trainer:732) INFO: 96epoch:train:1075-1432batch: iter_time=0.002, forward_time=0.098, loss_ctc=30.526, loss_att=14.890, acc=0.860, loss=19.581, backward_time=0.052, grad_norm=184.300, clip=100.000, loss_scale=456.784, optim_step_time=0.032, optim0_lr0=2.098e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:30:53,928 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:31:31,803 (trainer:732) INFO: 96epoch:train:1433-1790batch: iter_time=0.003, forward_time=0.098, loss_ctc=30.967, loss_att=15.113, acc=0.858, loss=19.869, backward_time=0.052, grad_norm=191.244, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.098e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:33:02,740 (trainer:732) INFO: 96epoch:train:1791-2148batch: iter_time=0.005, forward_time=0.098, loss_ctc=30.459, loss_att=14.856, acc=0.858, loss=19.537, backward_time=0.051, grad_norm=184.238, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.097e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:34:32,911 (trainer:732) INFO: 96epoch:train:2149-2506batch: iter_time=0.002, forward_time=0.098, loss_ctc=30.870, loss_att=15.084, acc=0.858, loss=19.820, backward_time=0.051, grad_norm=189.851, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.097e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:36:02,844 (trainer:732) INFO: 96epoch:train:2507-2864batch: iter_time=0.003, forward_time=0.097, loss_ctc=30.642, loss_att=14.945, acc=0.859, loss=19.654, backward_time=0.051, grad_norm=195.207, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.096e-05, train_time=0.251 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:37:35,001 (trainer:732) INFO: 96epoch:train:2865-3222batch: iter_time=0.009, forward_time=0.098, loss_ctc=29.735, loss_att=14.491, acc=0.860, loss=19.064, backward_time=0.052, grad_norm=198.236, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.096e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:39:06,301 (trainer:732) INFO: 96epoch:train:3223-3580batch: iter_time=0.005, forward_time=0.098, loss_ctc=30.974, loss_att=15.168, acc=0.857, loss=19.910, backward_time=0.051, grad_norm=193.815, clip=100.000, loss_scale=416.894, optim_step_time=0.032, optim0_lr0=2.095e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:40:38,510 (trainer:732) INFO: 96epoch:train:3581-3938batch: iter_time=0.007, forward_time=0.098, loss_ctc=30.576, loss_att=14.901, acc=0.856, loss=19.604, backward_time=0.051, grad_norm=189.058, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.095e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:42:09,759 (trainer:732) INFO: 96epoch:train:3939-4296batch: iter_time=0.008, forward_time=0.097, loss_ctc=29.791, loss_att=14.575, acc=0.858, loss=19.140, backward_time=0.051, grad_norm=186.937, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.094e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:43:08,053 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:43:42,809 (trainer:732) INFO: 96epoch:train:4297-4654batch: iter_time=0.009, forward_time=0.099, loss_ctc=30.064, loss_att=14.659, acc=0.859, loss=19.281, backward_time=0.051, grad_norm=183.862, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.094e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:45:14,382 (trainer:732) INFO: 96epoch:train:4655-5012batch: iter_time=0.006, forward_time=0.098, loss_ctc=30.905, loss_att=15.109, acc=0.855, loss=19.848, backward_time=0.052, grad_norm=192.608, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.093e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:46:47,387 (trainer:732) INFO: 96epoch:train:5013-5370batch: iter_time=0.010, forward_time=0.098, loss_ctc=30.560, loss_att=14.925, acc=0.858, loss=19.616, backward_time=0.051, grad_norm=192.441, clip=100.000, loss_scale=533.453, optim_step_time=0.033, optim0_lr0=2.092e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:46:56,675 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:47:12,566 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:48:20,083 (trainer:732) INFO: 96epoch:train:5371-5728batch: iter_time=0.008, forward_time=0.099, loss_ctc=30.426, loss_att=14.890, acc=0.857, loss=19.550, backward_time=0.051, grad_norm=186.815, clip=100.000, loss_scale=649.681, optim_step_time=0.033, optim0_lr0=2.092e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:49:53,259 (trainer:732) INFO: 96epoch:train:5729-6086batch: iter_time=0.010, forward_time=0.099, loss_ctc=31.652, loss_att=15.462, acc=0.856, loss=20.319, backward_time=0.051, grad_norm=189.373, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.091e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:51:28,853 (trainer:732) INFO: 96epoch:train:6087-6444batch: iter_time=0.013, forward_time=0.100, loss_ctc=30.620, loss_att=14.967, acc=0.857, loss=19.663, backward_time=0.052, grad_norm=191.941, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.091e-05, train_time=0.267 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:53:01,150 (trainer:732) INFO: 96epoch:train:6445-6802batch: iter_time=0.010, forward_time=0.097, loss_ctc=31.113, loss_att=15.152, acc=0.856, loss=19.940, backward_time=0.051, grad_norm=187.952, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.090e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:54:36,474 (trainer:732) INFO: 96epoch:train:6803-7160batch: iter_time=0.015, forward_time=0.099, loss_ctc=29.603, loss_att=14.434, acc=0.861, loss=18.985, backward_time=0.051, grad_norm=184.730, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.090e-05, train_time=0.266 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:55:44,968 (trainer:338) INFO: 96epoch results: [train] iter_time=0.007, forward_time=0.098, loss_ctc=30.603, loss_att=14.936, acc=0.858, loss=19.636, backward_time=0.051, grad_norm=189.325, clip=100.000, loss_scale=427.143, optim_step_time=0.033, optim0_lr0=2.095e-05, train_time=0.256, time=30 minutes and 38.33 seconds, total_count=687456, gpu_max_cached_mem_GB=26.869, [valid] loss_ctc=14.040, cer_ctc=0.072, loss_att=7.364, acc=0.928, cer=0.046, wer=0.653, loss=9.367, time=14.53 seconds, total_count=5088, gpu_max_cached_mem_GB=26.869, [att_plot] time=53.35 seconds, total_count=0, gpu_max_cached_mem_GB=26.869 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:55:48,580 (trainer:384) INFO: There are no improvements in this epoch [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:55:48,601 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/86epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:55:48,601 (trainer:272) INFO: 97/100epoch started. Estimated time to finish: 2 hours, 7 minutes and 38.24 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:56:28,254 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:57:21,919 (trainer:732) INFO: 97epoch:train:1-358batch: iter_time=0.003, forward_time=0.105, loss_ctc=31.665, loss_att=15.363, acc=0.858, loss=20.253, backward_time=0.051, grad_norm=191.168, clip=100.000, loss_scale=364.997, optim_step_time=0.033, optim0_lr0=2.089e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 11:58:51,512 (trainer:732) INFO: 97epoch:train:359-716batch: iter_time=0.003, forward_time=0.098, loss_ctc=29.862, loss_att=14.529, acc=0.858, loss=19.129, backward_time=0.051, grad_norm=186.299, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.089e-05, train_time=0.250 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:00:21,773 (trainer:732) INFO: 97epoch:train:717-1074batch: iter_time=0.003, forward_time=0.098, loss_ctc=30.363, loss_att=14.832, acc=0.858, loss=19.491, backward_time=0.051, grad_norm=191.495, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.088e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:01:53,463 (trainer:732) INFO: 97epoch:train:1075-1432batch: iter_time=0.005, forward_time=0.099, loss_ctc=30.459, loss_att=14.844, acc=0.858, loss=19.528, backward_time=0.051, grad_norm=187.442, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.088e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:03:23,846 (trainer:732) INFO: 97epoch:train:1433-1790batch: iter_time=0.002, forward_time=0.098, loss_ctc=31.404, loss_att=15.272, acc=0.858, loss=20.112, backward_time=0.052, grad_norm=188.005, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.087e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:04:55,853 (trainer:732) INFO: 97epoch:train:1791-2148batch: iter_time=0.004, forward_time=0.100, loss_ctc=30.733, loss_att=15.009, acc=0.857, loss=19.726, backward_time=0.051, grad_norm=195.770, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.086e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:06:08,734 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:06:27,237 (trainer:732) INFO: 97epoch:train:2149-2506batch: iter_time=0.005, forward_time=0.098, loss_ctc=31.049, loss_att=15.144, acc=0.856, loss=19.915, backward_time=0.051, grad_norm=196.592, clip=100.000, loss_scale=508.425, optim_step_time=0.032, optim0_lr0=2.086e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:07:35,771 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:07:57,727 (trainer:732) INFO: 97epoch:train:2507-2864batch: iter_time=0.005, forward_time=0.097, loss_ctc=29.538, loss_att=14.429, acc=0.859, loss=18.961, backward_time=0.051, grad_norm=181.695, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.085e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:09:28,662 (trainer:732) INFO: 97epoch:train:2865-3222batch: iter_time=0.004, forward_time=0.098, loss_ctc=30.365, loss_att=14.829, acc=0.861, loss=19.490, backward_time=0.051, grad_norm=184.691, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.085e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:11:00,857 (trainer:732) INFO: 97epoch:train:3223-3580batch: iter_time=0.007, forward_time=0.098, loss_ctc=30.212, loss_att=14.724, acc=0.859, loss=19.370, backward_time=0.052, grad_norm=184.748, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.084e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:12:31,725 (trainer:732) INFO: 97epoch:train:3581-3938batch: iter_time=0.006, forward_time=0.097, loss_ctc=31.401, loss_att=15.268, acc=0.855, loss=20.108, backward_time=0.051, grad_norm=192.461, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.084e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:13:40,893 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:14:04,660 (trainer:732) INFO: 97epoch:train:3939-4296batch: iter_time=0.010, forward_time=0.098, loss_ctc=29.744, loss_att=14.525, acc=0.861, loss=19.091, backward_time=0.052, grad_norm=184.163, clip=100.000, loss_scale=583.709, optim_step_time=0.032, optim0_lr0=2.083e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:14:21,477 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:15:06,964 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:15:34,751 (trainer:732) INFO: 97epoch:train:4297-4654batch: iter_time=0.003, forward_time=0.098, loss_ctc=31.334, loss_att=15.284, acc=0.858, loss=20.099, backward_time=0.051, grad_norm=189.135, clip=100.000, loss_scale=431.686, optim_step_time=0.032, optim0_lr0=2.083e-05, train_time=0.251 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:17:06,656 (trainer:732) INFO: 97epoch:train:4655-5012batch: iter_time=0.007, forward_time=0.099, loss_ctc=30.249, loss_att=14.791, acc=0.858, loss=19.429, backward_time=0.051, grad_norm=187.423, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.082e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:18:38,195 (trainer:732) INFO: 97epoch:train:5013-5370batch: iter_time=0.006, forward_time=0.098, loss_ctc=30.121, loss_att=14.696, acc=0.858, loss=19.324, backward_time=0.052, grad_norm=185.777, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.082e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:20:08,265 (trainer:732) INFO: 97epoch:train:5371-5728batch: iter_time=0.005, forward_time=0.097, loss_ctc=31.372, loss_att=15.324, acc=0.857, loss=20.138, backward_time=0.051, grad_norm=198.249, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.081e-05, train_time=0.251 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:21:41,930 (trainer:732) INFO: 97epoch:train:5729-6086batch: iter_time=0.009, forward_time=0.100, loss_ctc=30.561, loss_att=14.965, acc=0.857, loss=19.644, backward_time=0.051, grad_norm=188.153, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.081e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:23:14,330 (trainer:732) INFO: 97epoch:train:6087-6444batch: iter_time=0.008, forward_time=0.099, loss_ctc=29.072, loss_att=14.206, acc=0.860, loss=18.666, backward_time=0.052, grad_norm=186.661, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.080e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:24:48,775 (trainer:732) INFO: 97epoch:train:6445-6802batch: iter_time=0.016, forward_time=0.097, loss_ctc=29.927, loss_att=14.621, acc=0.859, loss=19.213, backward_time=0.051, grad_norm=183.291, clip=100.000, loss_scale=441.922, optim_step_time=0.032, optim0_lr0=2.079e-05, train_time=0.264 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:26:07,454 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:26:23,474 (trainer:732) INFO: 97epoch:train:6803-7160batch: iter_time=0.013, forward_time=0.099, loss_ctc=31.245, loss_att=15.286, acc=0.857, loss=20.073, backward_time=0.052, grad_norm=188.883, clip=100.000, loss_scale=468.258, optim_step_time=0.033, optim0_lr0=2.079e-05, train_time=0.264 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:27:32,142 (trainer:338) INFO: 97epoch results: [train] iter_time=0.006, forward_time=0.099, loss_ctc=30.522, loss_att=14.892, acc=0.858, loss=19.581, backward_time=0.051, grad_norm=188.597, clip=100.000, loss_scale=370.283, optim_step_time=0.033, optim0_lr0=2.084e-05, train_time=0.256, time=30 minutes and 35.49 seconds, total_count=694617, gpu_max_cached_mem_GB=26.869, [valid] loss_ctc=14.063, cer_ctc=0.072, loss_att=7.374, acc=0.928, cer=0.046, wer=0.648, loss=9.381, time=14.59 seconds, total_count=5141, gpu_max_cached_mem_GB=26.869, [att_plot] time=53.46 seconds, total_count=0, gpu_max_cached_mem_GB=26.869 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:27:35,825 (trainer:384) INFO: There are no improvements in this epoch [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:27:35,849 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/84epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:27:35,849 (trainer:272) INFO: 98/100epoch started. Estimated time to finish: 1 hour, 35 minutes and 42.8 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:29:08,828 (trainer:732) INFO: 98epoch:train:1-358batch: iter_time=0.003, forward_time=0.104, loss_ctc=30.245, loss_att=14.729, acc=0.861, loss=19.384, backward_time=0.052, grad_norm=194.280, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.078e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:30:39,896 (trainer:732) INFO: 98epoch:train:359-716batch: iter_time=0.003, forward_time=0.099, loss_ctc=31.001, loss_att=15.128, acc=0.857, loss=19.890, backward_time=0.051, grad_norm=193.368, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.078e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:32:09,484 (trainer:732) INFO: 98epoch:train:717-1074batch: iter_time=7.096e-04, forward_time=0.098, loss_ctc=30.561, loss_att=14.917, acc=0.858, loss=19.611, backward_time=0.051, grad_norm=189.940, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.077e-05, train_time=0.250 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:33:39,877 (trainer:732) INFO: 98epoch:train:1075-1432batch: iter_time=0.001, forward_time=0.099, loss_ctc=32.535, loss_att=15.908, acc=0.855, loss=20.896, backward_time=0.051, grad_norm=186.167, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.077e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:35:11,446 (trainer:732) INFO: 98epoch:train:1433-1790batch: iter_time=0.005, forward_time=0.099, loss_ctc=30.594, loss_att=14.916, acc=0.859, loss=19.619, backward_time=0.052, grad_norm=190.271, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.076e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:35:43,447 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:36:43,343 (trainer:732) INFO: 98epoch:train:1791-2148batch: iter_time=0.006, forward_time=0.099, loss_ctc=29.397, loss_att=14.308, acc=0.861, loss=18.835, backward_time=0.051, grad_norm=187.195, clip=100.000, loss_scale=406.168, optim_step_time=0.033, optim0_lr0=2.076e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:38:13,992 (trainer:732) INFO: 98epoch:train:2149-2506batch: iter_time=0.002, forward_time=0.099, loss_ctc=31.557, loss_att=15.354, acc=0.859, loss=20.215, backward_time=0.052, grad_norm=199.111, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.075e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:39:45,526 (trainer:732) INFO: 98epoch:train:2507-2864batch: iter_time=0.007, forward_time=0.098, loss_ctc=29.308, loss_att=14.258, acc=0.860, loss=18.773, backward_time=0.051, grad_norm=192.087, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.075e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:39:46,204 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:41:06,496 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:41:17,540 (trainer:732) INFO: 98epoch:train:2865-3222batch: iter_time=0.004, forward_time=0.100, loss_ctc=31.998, loss_att=15.646, acc=0.858, loss=20.552, backward_time=0.051, grad_norm=201.511, clip=100.000, loss_scale=257.434, optim_step_time=0.032, optim0_lr0=2.074e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:42:48,914 (trainer:732) INFO: 98epoch:train:3223-3580batch: iter_time=0.007, forward_time=0.098, loss_ctc=30.296, loss_att=14.802, acc=0.857, loss=19.451, backward_time=0.052, grad_norm=186.527, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.074e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:44:20,853 (trainer:732) INFO: 98epoch:train:3581-3938batch: iter_time=0.006, forward_time=0.098, loss_ctc=31.164, loss_att=15.157, acc=0.857, loss=19.959, backward_time=0.051, grad_norm=192.947, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.073e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:45:51,928 (trainer:732) INFO: 98epoch:train:3939-4296batch: iter_time=0.008, forward_time=0.097, loss_ctc=29.573, loss_att=14.401, acc=0.861, loss=18.953, backward_time=0.051, grad_norm=184.854, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.073e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:47:24,004 (trainer:732) INFO: 98epoch:train:4297-4654batch: iter_time=0.006, forward_time=0.099, loss_ctc=31.633, loss_att=15.435, acc=0.857, loss=20.294, backward_time=0.052, grad_norm=190.629, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.072e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:48:28,154 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:48:55,050 (trainer:732) INFO: 98epoch:train:4655-5012batch: iter_time=0.006, forward_time=0.098, loss_ctc=30.550, loss_att=14.949, acc=0.856, loss=19.629, backward_time=0.051, grad_norm=185.973, clip=100.000, loss_scale=282.532, optim_step_time=0.032, optim0_lr0=2.071e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:50:28,130 (trainer:732) INFO: 98epoch:train:5013-5370batch: iter_time=0.009, forward_time=0.099, loss_ctc=29.018, loss_att=14.129, acc=0.861, loss=18.596, backward_time=0.052, grad_norm=190.824, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.071e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:52:01,821 (trainer:732) INFO: 98epoch:train:5371-5728batch: iter_time=0.012, forward_time=0.098, loss_ctc=29.145, loss_att=14.204, acc=0.862, loss=18.687, backward_time=0.052, grad_norm=183.781, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.070e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:53:35,339 (trainer:732) INFO: 98epoch:train:5729-6086batch: iter_time=0.012, forward_time=0.098, loss_ctc=29.782, loss_att=14.528, acc=0.861, loss=19.104, backward_time=0.051, grad_norm=181.955, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.070e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:54:40,675 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:55:08,279 (trainer:732) INFO: 98epoch:train:6087-6444batch: iter_time=0.012, forward_time=0.097, loss_ctc=30.516, loss_att=14.890, acc=0.858, loss=19.578, backward_time=0.051, grad_norm=188.702, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.069e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:56:41,648 (trainer:732) INFO: 98epoch:train:6445-6802batch: iter_time=0.012, forward_time=0.098, loss_ctc=29.806, loss_att=14.539, acc=0.859, loss=19.119, backward_time=0.051, grad_norm=182.265, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.069e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:58:15,109 (trainer:732) INFO: 98epoch:train:6803-7160batch: iter_time=0.010, forward_time=0.099, loss_ctc=29.838, loss_att=14.520, acc=0.858, loss=19.115, backward_time=0.052, grad_norm=188.570, clip=100.000, loss_scale=438.346, optim_step_time=0.033, optim0_lr0=2.068e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:59:23,995 (trainer:338) INFO: 98epoch results: [train] iter_time=0.007, forward_time=0.099, loss_ctc=30.408, loss_att=14.827, acc=0.859, loss=19.501, backward_time=0.051, grad_norm=189.566, clip=100.000, loss_scale=299.662, optim_step_time=0.033, optim0_lr0=2.073e-05, train_time=0.257, time=30 minutes and 39.95 seconds, total_count=701778, gpu_max_cached_mem_GB=26.869, [valid] loss_ctc=13.992, cer_ctc=0.071, loss_att=7.382, acc=0.928, cer=0.045, wer=0.648, loss=9.365, time=14.54 seconds, total_count=5194, gpu_max_cached_mem_GB=26.869, [att_plot] time=53.65 seconds, total_count=0, gpu_max_cached_mem_GB=26.869 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:59:27,624 (trainer:384) INFO: There are no improvements in this epoch [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:59:27,651 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/89epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 12:59:27,652 (trainer:272) INFO: 99/100epoch started. Estimated time to finish: 1 hour, 3 minutes and 48.35 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:00:58,543 (trainer:732) INFO: 99epoch:train:1-358batch: iter_time=0.003, forward_time=0.098, loss_ctc=31.131, loss_att=15.150, acc=0.857, loss=19.945, backward_time=0.051, grad_norm=193.347, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.068e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:02:28,074 (trainer:732) INFO: 99epoch:train:359-716batch: iter_time=0.003, forward_time=0.097, loss_ctc=29.701, loss_att=14.384, acc=0.862, loss=18.979, backward_time=0.051, grad_norm=197.260, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.067e-05, train_time=0.250 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:03:35,635 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:03:59,153 (trainer:732) INFO: 99epoch:train:717-1074batch: iter_time=0.002, forward_time=0.099, loss_ctc=30.950, loss_att=15.068, acc=0.860, loss=19.833, backward_time=0.051, grad_norm=196.892, clip=100.000, loss_scale=443.877, optim_step_time=0.032, optim0_lr0=2.067e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:05:30,559 (trainer:732) INFO: 99epoch:train:1075-1432batch: iter_time=0.005, forward_time=0.098, loss_ctc=29.467, loss_att=14.368, acc=0.860, loss=18.897, backward_time=0.051, grad_norm=189.275, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.066e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:07:01,038 (trainer:732) INFO: 99epoch:train:1433-1790batch: iter_time=0.002, forward_time=0.098, loss_ctc=30.841, loss_att=15.056, acc=0.860, loss=19.792, backward_time=0.051, grad_norm=194.739, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.066e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:07:41,179 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:08:30,591 (trainer:732) INFO: 99epoch:train:1791-2148batch: iter_time=0.003, forward_time=0.097, loss_ctc=29.902, loss_att=14.503, acc=0.862, loss=19.123, backward_time=0.051, grad_norm=187.591, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.065e-05, train_time=0.250 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:10:01,706 (trainer:732) INFO: 99epoch:train:2149-2506batch: iter_time=0.005, forward_time=0.098, loss_ctc=30.002, loss_att=14.598, acc=0.860, loss=19.219, backward_time=0.051, grad_norm=188.249, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.065e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:11:33,109 (trainer:732) INFO: 99epoch:train:2507-2864batch: iter_time=0.006, forward_time=0.098, loss_ctc=30.284, loss_att=14.781, acc=0.859, loss=19.432, backward_time=0.051, grad_norm=190.695, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.064e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:12:55,591 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:13:04,501 (trainer:732) INFO: 99epoch:train:2865-3222batch: iter_time=0.004, forward_time=0.099, loss_ctc=30.355, loss_att=14.811, acc=0.860, loss=19.474, backward_time=0.052, grad_norm=192.375, clip=100.000, loss_scale=429.765, optim_step_time=0.032, optim0_lr0=2.064e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:14:35,672 (trainer:732) INFO: 99epoch:train:3223-3580batch: iter_time=0.006, forward_time=0.098, loss_ctc=29.814, loss_att=14.538, acc=0.860, loss=19.120, backward_time=0.051, grad_norm=190.796, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.063e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:16:09,043 (trainer:732) INFO: 99epoch:train:3581-3938batch: iter_time=0.005, forward_time=0.104, loss_ctc=29.533, loss_att=14.418, acc=0.861, loss=18.952, backward_time=0.052, grad_norm=186.649, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.062e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:16:45,879 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:17:41,468 (trainer:732) INFO: 99epoch:train:3939-4296batch: iter_time=0.005, forward_time=0.101, loss_ctc=30.447, loss_att=14.839, acc=0.858, loss=19.522, backward_time=0.051, grad_norm=193.544, clip=100.000, loss_scale=357.826, optim_step_time=0.032, optim0_lr0=2.062e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:18:45,491 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:19:13,553 (trainer:732) INFO: 99epoch:train:4297-4654batch: iter_time=0.007, forward_time=0.098, loss_ctc=30.846, loss_att=15.045, acc=0.861, loss=19.785, backward_time=0.051, grad_norm=192.652, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.061e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:20:45,104 (trainer:732) INFO: 99epoch:train:4655-5012batch: iter_time=0.007, forward_time=0.097, loss_ctc=29.901, loss_att=14.544, acc=0.860, loss=19.151, backward_time=0.052, grad_norm=198.735, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.061e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:22:14,615 (trainer:732) INFO: 99epoch:train:5013-5370batch: iter_time=0.005, forward_time=0.096, loss_ctc=30.130, loss_att=14.743, acc=0.857, loss=19.359, backward_time=0.051, grad_norm=189.856, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.060e-05, train_time=0.250 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:23:46,669 (trainer:732) INFO: 99epoch:train:5371-5728batch: iter_time=0.005, forward_time=0.100, loss_ctc=31.432, loss_att=15.353, acc=0.857, loss=20.177, backward_time=0.051, grad_norm=199.330, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.060e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:25:19,895 (trainer:732) INFO: 99epoch:train:5729-6086batch: iter_time=0.007, forward_time=0.102, loss_ctc=32.135, loss_att=15.719, acc=0.858, loss=20.644, backward_time=0.051, grad_norm=195.001, clip=100.000, loss_scale=259.575, optim_step_time=0.032, optim0_lr0=2.059e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:26:51,194 (trainer:732) INFO: 99epoch:train:6087-6444batch: iter_time=0.008, forward_time=0.098, loss_ctc=28.934, loss_att=14.073, acc=0.862, loss=18.531, backward_time=0.051, grad_norm=185.839, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.059e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:28:23,884 (trainer:732) INFO: 99epoch:train:6445-6802batch: iter_time=0.007, forward_time=0.099, loss_ctc=30.857, loss_att=15.132, acc=0.857, loss=19.850, backward_time=0.052, grad_norm=194.867, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.058e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:29:38,770 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:29:56,917 (trainer:732) INFO: 99epoch:train:6803-7160batch: iter_time=0.010, forward_time=0.099, loss_ctc=29.972, loss_att=14.620, acc=0.858, loss=19.226, backward_time=0.051, grad_norm=183.338, clip=100.000, loss_scale=461.087, optim_step_time=0.033, optim0_lr0=2.058e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:31:05,831 (trainer:338) INFO: 99epoch results: [train] iter_time=0.005, forward_time=0.099, loss_ctc=30.320, loss_att=14.781, acc=0.859, loss=19.443, backward_time=0.051, grad_norm=192.038, clip=100.000, loss_scale=366.368, optim_step_time=0.033, optim0_lr0=2.063e-05, train_time=0.255, time=30 minutes and 29.91 seconds, total_count=708939, gpu_max_cached_mem_GB=26.869, [valid] loss_ctc=13.844, cer_ctc=0.071, loss_att=7.300, acc=0.929, cer=0.045, wer=0.638, loss=9.263, time=14.76 seconds, total_count=5247, gpu_max_cached_mem_GB=26.869, [att_plot] time=53.51 seconds, total_count=0, gpu_max_cached_mem_GB=26.869 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:31:09,483 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:31:09,497 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/88epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:31:09,497 (trainer:272) INFO: 100/100epoch started. Estimated time to finish: 31 minutes and 53.72 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:32:41,624 (trainer:732) INFO: 100epoch:train:1-358batch: iter_time=0.002, forward_time=0.103, loss_ctc=30.445, loss_att=14.874, acc=0.860, loss=19.545, backward_time=0.051, grad_norm=198.795, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.057e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:33:18,473 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:34:11,601 (trainer:732) INFO: 100epoch:train:359-716batch: iter_time=0.003, forward_time=0.098, loss_ctc=29.767, loss_att=14.516, acc=0.860, loss=19.091, backward_time=0.052, grad_norm=185.833, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.057e-05, train_time=0.251 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:35:41,095 (trainer:732) INFO: 100epoch:train:717-1074batch: iter_time=0.002, forward_time=0.098, loss_ctc=29.507, loss_att=14.334, acc=0.860, loss=18.886, backward_time=0.052, grad_norm=186.888, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.056e-05, train_time=0.250 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:37:12,377 (trainer:732) INFO: 100epoch:train:1075-1432batch: iter_time=0.002, forward_time=0.099, loss_ctc=31.663, loss_att=15.414, acc=0.856, loss=20.289, backward_time=0.052, grad_norm=192.153, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.056e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:38:42,550 (trainer:732) INFO: 100epoch:train:1433-1790batch: iter_time=0.003, forward_time=0.098, loss_ctc=29.082, loss_att=14.163, acc=0.861, loss=18.639, backward_time=0.051, grad_norm=184.274, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.055e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:40:14,054 (trainer:732) INFO: 100epoch:train:1791-2148batch: iter_time=0.004, forward_time=0.099, loss_ctc=30.274, loss_att=14.780, acc=0.860, loss=19.428, backward_time=0.051, grad_norm=186.109, clip=100.000, loss_scale=413.318, optim_step_time=0.032, optim0_lr0=2.055e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:41:45,249 (trainer:732) INFO: 100epoch:train:2149-2506batch: iter_time=0.004, forward_time=0.099, loss_ctc=29.895, loss_att=14.557, acc=0.860, loss=19.158, backward_time=0.051, grad_norm=194.050, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.054e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:43:08,615 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:43:16,367 (trainer:732) INFO: 100epoch:train:2507-2864batch: iter_time=0.004, forward_time=0.099, loss_ctc=29.828, loss_att=14.490, acc=0.863, loss=19.092, backward_time=0.051, grad_norm=192.297, clip=100.000, loss_scale=490.487, optim_step_time=0.033, optim0_lr0=2.054e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:44:48,474 (trainer:732) INFO: 100epoch:train:2865-3222batch: iter_time=0.009, forward_time=0.097, loss_ctc=29.598, loss_att=14.457, acc=0.862, loss=18.999, backward_time=0.051, grad_norm=191.053, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.053e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:46:20,386 (trainer:732) INFO: 100epoch:train:3223-3580batch: iter_time=0.003, forward_time=0.100, loss_ctc=32.256, loss_att=15.752, acc=0.858, loss=20.703, backward_time=0.052, grad_norm=196.811, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.053e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:46:26,389 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:47:51,446 (trainer:732) INFO: 100epoch:train:3581-3938batch: iter_time=0.004, forward_time=0.098, loss_ctc=31.389, loss_att=15.309, acc=0.858, loss=20.133, backward_time=0.051, grad_norm=193.213, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.052e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:49:20,660 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:49:23,089 (trainer:732) INFO: 100epoch:train:3939-4296batch: iter_time=0.006, forward_time=0.098, loss_ctc=31.206, loss_att=15.247, acc=0.856, loss=20.035, backward_time=0.051, grad_norm=189.991, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.052e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:50:55,020 (trainer:732) INFO: 100epoch:train:4297-4654batch: iter_time=0.007, forward_time=0.098, loss_ctc=31.184, loss_att=15.226, acc=0.857, loss=20.013, backward_time=0.052, grad_norm=189.069, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.051e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:52:27,344 (trainer:732) INFO: 100epoch:train:4655-5012batch: iter_time=0.009, forward_time=0.098, loss_ctc=29.397, loss_att=14.342, acc=0.862, loss=18.858, backward_time=0.051, grad_norm=180.974, clip=100.000, loss_scale=383.285, optim_step_time=0.033, optim0_lr0=2.051e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:54:00,706 (trainer:732) INFO: 100epoch:train:5013-5370batch: iter_time=0.010, forward_time=0.099, loss_ctc=30.251, loss_att=14.730, acc=0.861, loss=19.386, backward_time=0.051, grad_norm=186.433, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.050e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:55:33,054 (trainer:732) INFO: 100epoch:train:5371-5728batch: iter_time=0.010, forward_time=0.098, loss_ctc=29.572, loss_att=14.341, acc=0.862, loss=18.910, backward_time=0.051, grad_norm=189.671, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.050e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:57:06,725 (trainer:732) INFO: 100epoch:train:5729-6086batch: iter_time=0.011, forward_time=0.099, loss_ctc=29.036, loss_att=14.166, acc=0.863, loss=18.627, backward_time=0.052, grad_norm=182.062, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.049e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 13:58:40,458 (trainer:732) INFO: 100epoch:train:6087-6444batch: iter_time=0.011, forward_time=0.098, loss_ctc=30.633, loss_att=14.931, acc=0.857, loss=19.642, backward_time=0.051, grad_norm=190.318, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.049e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 14:00:13,152 (trainer:732) INFO: 100epoch:train:6445-6802batch: iter_time=0.010, forward_time=0.098, loss_ctc=30.401, loss_att=14.788, acc=0.859, loss=19.472, backward_time=0.051, grad_norm=201.559, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.048e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 14:01:08,158 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 14:01:34,283 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 14:01:45,871 (trainer:732) INFO: 100epoch:train:6803-7160batch: iter_time=0.011, forward_time=0.097, loss_ctc=29.581, loss_att=14.433, acc=0.861, loss=18.977, backward_time=0.051, grad_norm=189.871, clip=100.000, loss_scale=732.764, optim_step_time=0.033, optim0_lr0=2.047e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 14:02:54,488 (trainer:338) INFO: 100epoch results: [train] iter_time=0.006, forward_time=0.098, loss_ctc=30.232, loss_att=14.734, acc=0.860, loss=19.384, backward_time=0.051, grad_norm=190.067, clip=100.000, loss_scale=382.462, optim_step_time=0.033, optim0_lr0=2.052e-05, train_time=0.256, time=30 minutes and 37 seconds, total_count=716100, gpu_max_cached_mem_GB=26.869, [valid] loss_ctc=13.686, cer_ctc=0.070, loss_att=7.285, acc=0.929, cer=0.045, wer=0.639, loss=9.205, time=14.58 seconds, total_count=5300, gpu_max_cached_mem_GB=26.869, [att_plot] time=53.42 seconds, total_count=0, gpu_max_cached_mem_GB=26.869 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 14:02:57,913 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 14:02:57,925 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/90epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 14:02:57,925 (trainer:458) INFO: The training was finished at 100 epochs [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-qvsbqe-worker] 2023-05-15 14:02:57,957 (average_nbest_models:69) INFO: Averaging 10best models: criterion="valid.acc": exp/asr_train_raw_bpe2000_sp/valid.acc.ave_10best.pth # Accounting: time=53600 threads=1 # Ended (code 0) at Mon May 15 14:03:04 CST 2023, elapsed time 53600 seconds