# python3 -m espnet2.bin.asr_train --use_preprocessor true --bpemodel data/token_list/bpe_unigram2000/bpe.model --token_type bpe --token_list data/token_list/bpe_unigram2000/tokens.txt --non_linguistic_symbols none --cleaner none --g2p none --valid_data_path_and_name_and_type dump/raw/dev/wav.scp,speech,sound --valid_shape_file exp/asr_stats_raw_bpe2000_sp/valid/speech_shape --resume true --ignore_init_mismatch false --fold_length 80000 --output_dir exp/asr_train_raw_bpe2000_sp --config conf/train.yaml --frontend_conf fs=16k --normalize=global_mvn --normalize_conf stats_file=exp/asr_stats_raw_bpe2000_sp/train/feats_stats.npz --train_data_path_and_name_and_type dump/raw/train_sp/wav.scp,speech,sound --train_shape_file exp/asr_stats_raw_bpe2000_sp/train/speech_shape --fold_length 150 --train_data_path_and_name_and_type dump/raw/train_sp/text,text,text --train_shape_file exp/asr_stats_raw_bpe2000_sp/train/text_shape.bpe --valid_data_path_and_name_and_type dump/raw/dev/text,text,text --valid_shape_file exp/asr_stats_raw_bpe2000_sp/valid/text_shape.bpe --ngpu 1 --multiprocessing_distributed True # Started at Fri May 12 18:28:58 CST 2023 # /mnt/bd/khassan-volume3/tools/espent_KSC_recipe_test/tools/miniconda/envs/espnet/bin/python3 /mnt/bd/khassan-volume3/tools/espent_KSC_recipe_test/espnet2/bin/asr_train.py --use_preprocessor true --bpemodel data/token_list/bpe_unigram2000/bpe.model --token_type bpe --token_list data/token_list/bpe_unigram2000/tokens.txt --non_linguistic_symbols none --cleaner none --g2p none --valid_data_path_and_name_and_type dump/raw/dev/wav.scp,speech,sound --valid_shape_file exp/asr_stats_raw_bpe2000_sp/valid/speech_shape --resume true --ignore_init_mismatch false --fold_length 80000 --output_dir exp/asr_train_raw_bpe2000_sp --config conf/train.yaml --frontend_conf fs=16k --normalize=global_mvn --normalize_conf stats_file=exp/asr_stats_raw_bpe2000_sp/train/feats_stats.npz --train_data_path_and_name_and_type dump/raw/train_sp/wav.scp,speech,sound --train_shape_file exp/asr_stats_raw_bpe2000_sp/train/speech_shape --fold_length 150 --train_data_path_and_name_and_type dump/raw/train_sp/text,text,text --train_shape_file exp/asr_stats_raw_bpe2000_sp/train/text_shape.bpe --valid_data_path_and_name_and_type dump/raw/dev/text,text,text --valid_shape_file exp/asr_stats_raw_bpe2000_sp/valid/text_shape.bpe --ngpu 1 --multiprocessing_distributed True [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:05,489 (asr:500) INFO: Vocabulary size: 2000 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,205 (initialize:88) INFO: Initialize encoder.embed.conv.0.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,205 (initialize:88) INFO: Initialize encoder.embed.conv.2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,205 (initialize:88) INFO: Initialize encoder.embed.out.0.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,205 (initialize:88) INFO: Initialize encoder.encoders.0.self_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,205 (initialize:88) INFO: Initialize encoder.encoders.0.self_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,205 (initialize:88) INFO: Initialize encoder.encoders.0.self_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,205 (initialize:88) INFO: Initialize encoder.encoders.0.self_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,205 (initialize:88) INFO: Initialize encoder.encoders.0.feed_forward.w_1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,205 (initialize:88) INFO: Initialize encoder.encoders.0.feed_forward.w_2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,205 (initialize:88) INFO: Initialize encoder.encoders.0.norm1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,205 (initialize:88) INFO: Initialize encoder.encoders.0.norm2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,205 (initialize:88) INFO: Initialize encoder.encoders.1.self_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,205 (initialize:88) INFO: Initialize encoder.encoders.1.self_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,205 (initialize:88) INFO: Initialize encoder.encoders.1.self_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,206 (initialize:88) INFO: Initialize encoder.encoders.1.self_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,206 (initialize:88) INFO: Initialize encoder.encoders.1.feed_forward.w_1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,206 (initialize:88) INFO: Initialize encoder.encoders.1.feed_forward.w_2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,206 (initialize:88) INFO: Initialize encoder.encoders.1.norm1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,206 (initialize:88) INFO: Initialize encoder.encoders.1.norm2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,206 (initialize:88) INFO: Initialize encoder.encoders.2.self_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,206 (initialize:88) INFO: Initialize encoder.encoders.2.self_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,206 (initialize:88) INFO: Initialize encoder.encoders.2.self_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,206 (initialize:88) INFO: Initialize encoder.encoders.2.self_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,206 (initialize:88) INFO: Initialize encoder.encoders.2.feed_forward.w_1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,206 (initialize:88) INFO: Initialize encoder.encoders.2.feed_forward.w_2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,206 (initialize:88) INFO: Initialize encoder.encoders.2.norm1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,206 (initialize:88) INFO: Initialize encoder.encoders.2.norm2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,206 (initialize:88) INFO: Initialize encoder.encoders.3.self_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,206 (initialize:88) INFO: Initialize encoder.encoders.3.self_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,206 (initialize:88) INFO: Initialize encoder.encoders.3.self_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,206 (initialize:88) INFO: Initialize encoder.encoders.3.self_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,206 (initialize:88) INFO: Initialize encoder.encoders.3.feed_forward.w_1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,206 (initialize:88) INFO: Initialize encoder.encoders.3.feed_forward.w_2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,206 (initialize:88) INFO: Initialize encoder.encoders.3.norm1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,206 (initialize:88) INFO: Initialize encoder.encoders.3.norm2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,206 (initialize:88) INFO: Initialize encoder.encoders.4.self_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,206 (initialize:88) INFO: Initialize encoder.encoders.4.self_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,206 (initialize:88) INFO: Initialize encoder.encoders.4.self_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,206 (initialize:88) INFO: Initialize encoder.encoders.4.self_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,206 (initialize:88) INFO: Initialize encoder.encoders.4.feed_forward.w_1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,206 (initialize:88) INFO: Initialize encoder.encoders.4.feed_forward.w_2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,206 (initialize:88) INFO: Initialize encoder.encoders.4.norm1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,206 (initialize:88) INFO: Initialize encoder.encoders.4.norm2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,206 (initialize:88) INFO: Initialize encoder.encoders.5.self_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,206 (initialize:88) INFO: Initialize encoder.encoders.5.self_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,206 (initialize:88) INFO: Initialize encoder.encoders.5.self_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,207 (initialize:88) INFO: Initialize encoder.encoders.5.self_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,207 (initialize:88) INFO: Initialize encoder.encoders.5.feed_forward.w_1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,207 (initialize:88) INFO: Initialize encoder.encoders.5.feed_forward.w_2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,207 (initialize:88) INFO: Initialize encoder.encoders.5.norm1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,207 (initialize:88) INFO: Initialize encoder.encoders.5.norm2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,207 (initialize:88) INFO: Initialize encoder.encoders.6.self_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,207 (initialize:88) INFO: Initialize encoder.encoders.6.self_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,207 (initialize:88) INFO: Initialize encoder.encoders.6.self_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,207 (initialize:88) INFO: Initialize encoder.encoders.6.self_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,207 (initialize:88) INFO: Initialize encoder.encoders.6.feed_forward.w_1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,207 (initialize:88) INFO: Initialize encoder.encoders.6.feed_forward.w_2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,207 (initialize:88) INFO: Initialize encoder.encoders.6.norm1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,207 (initialize:88) INFO: Initialize encoder.encoders.6.norm2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,207 (initialize:88) INFO: Initialize encoder.encoders.7.self_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,207 (initialize:88) INFO: Initialize encoder.encoders.7.self_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,207 (initialize:88) INFO: Initialize encoder.encoders.7.self_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,207 (initialize:88) INFO: Initialize encoder.encoders.7.self_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,207 (initialize:88) INFO: Initialize encoder.encoders.7.feed_forward.w_1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,207 (initialize:88) INFO: Initialize encoder.encoders.7.feed_forward.w_2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,207 (initialize:88) INFO: Initialize encoder.encoders.7.norm1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,207 (initialize:88) INFO: Initialize encoder.encoders.7.norm2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,207 (initialize:88) INFO: Initialize encoder.encoders.8.self_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,207 (initialize:88) INFO: Initialize encoder.encoders.8.self_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,207 (initialize:88) INFO: Initialize encoder.encoders.8.self_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,207 (initialize:88) INFO: Initialize encoder.encoders.8.self_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,207 (initialize:88) INFO: Initialize encoder.encoders.8.feed_forward.w_1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,207 (initialize:88) INFO: Initialize encoder.encoders.8.feed_forward.w_2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,207 (initialize:88) INFO: Initialize encoder.encoders.8.norm1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,207 (initialize:88) INFO: Initialize encoder.encoders.8.norm2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,207 (initialize:88) INFO: Initialize encoder.encoders.9.self_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,207 (initialize:88) INFO: Initialize encoder.encoders.9.self_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,207 (initialize:88) INFO: Initialize encoder.encoders.9.self_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,207 (initialize:88) INFO: Initialize encoder.encoders.9.self_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,208 (initialize:88) INFO: Initialize encoder.encoders.9.feed_forward.w_1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,208 (initialize:88) INFO: Initialize encoder.encoders.9.feed_forward.w_2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,208 (initialize:88) INFO: Initialize encoder.encoders.9.norm1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,208 (initialize:88) INFO: Initialize encoder.encoders.9.norm2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,208 (initialize:88) INFO: Initialize encoder.encoders.10.self_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,208 (initialize:88) INFO: Initialize encoder.encoders.10.self_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,208 (initialize:88) INFO: Initialize encoder.encoders.10.self_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,208 (initialize:88) INFO: Initialize encoder.encoders.10.self_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,208 (initialize:88) INFO: Initialize encoder.encoders.10.feed_forward.w_1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,208 (initialize:88) INFO: Initialize encoder.encoders.10.feed_forward.w_2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,208 (initialize:88) INFO: Initialize encoder.encoders.10.norm1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,208 (initialize:88) INFO: Initialize encoder.encoders.10.norm2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,208 (initialize:88) INFO: Initialize encoder.encoders.11.self_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,208 (initialize:88) INFO: Initialize encoder.encoders.11.self_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,208 (initialize:88) INFO: Initialize encoder.encoders.11.self_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,208 (initialize:88) INFO: Initialize encoder.encoders.11.self_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,208 (initialize:88) INFO: Initialize encoder.encoders.11.feed_forward.w_1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,208 (initialize:88) INFO: Initialize encoder.encoders.11.feed_forward.w_2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,208 (initialize:88) INFO: Initialize encoder.encoders.11.norm1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,208 (initialize:88) INFO: Initialize encoder.encoders.11.norm2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,208 (initialize:88) INFO: Initialize encoder.after_norm.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,208 (initialize:88) INFO: Initialize decoder.after_norm.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,208 (initialize:88) INFO: Initialize decoder.output_layer.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,208 (initialize:88) INFO: Initialize decoder.decoders.0.self_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,208 (initialize:88) INFO: Initialize decoder.decoders.0.self_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,208 (initialize:88) INFO: Initialize decoder.decoders.0.self_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,208 (initialize:88) INFO: Initialize decoder.decoders.0.self_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,208 (initialize:88) INFO: Initialize decoder.decoders.0.src_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,208 (initialize:88) INFO: Initialize decoder.decoders.0.src_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,208 (initialize:88) INFO: Initialize decoder.decoders.0.src_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,208 (initialize:88) INFO: Initialize decoder.decoders.0.src_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,208 (initialize:88) INFO: Initialize decoder.decoders.0.feed_forward.w_1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,208 (initialize:88) INFO: Initialize decoder.decoders.0.feed_forward.w_2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,209 (initialize:88) INFO: Initialize decoder.decoders.0.norm1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,209 (initialize:88) INFO: Initialize decoder.decoders.0.norm2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,209 (initialize:88) INFO: Initialize decoder.decoders.0.norm3.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,209 (initialize:88) INFO: Initialize decoder.decoders.1.self_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,209 (initialize:88) INFO: Initialize decoder.decoders.1.self_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,209 (initialize:88) INFO: Initialize decoder.decoders.1.self_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,209 (initialize:88) INFO: Initialize decoder.decoders.1.self_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,209 (initialize:88) INFO: Initialize decoder.decoders.1.src_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,209 (initialize:88) INFO: Initialize decoder.decoders.1.src_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,209 (initialize:88) INFO: Initialize decoder.decoders.1.src_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,209 (initialize:88) INFO: Initialize decoder.decoders.1.src_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,209 (initialize:88) INFO: Initialize decoder.decoders.1.feed_forward.w_1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,209 (initialize:88) INFO: Initialize decoder.decoders.1.feed_forward.w_2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,209 (initialize:88) INFO: Initialize decoder.decoders.1.norm1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,209 (initialize:88) INFO: Initialize decoder.decoders.1.norm2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,209 (initialize:88) INFO: Initialize decoder.decoders.1.norm3.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,209 (initialize:88) INFO: Initialize decoder.decoders.2.self_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,209 (initialize:88) INFO: Initialize decoder.decoders.2.self_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,209 (initialize:88) INFO: Initialize decoder.decoders.2.self_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,209 (initialize:88) INFO: Initialize decoder.decoders.2.self_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,209 (initialize:88) INFO: Initialize decoder.decoders.2.src_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,209 (initialize:88) INFO: Initialize decoder.decoders.2.src_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,209 (initialize:88) INFO: Initialize decoder.decoders.2.src_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,209 (initialize:88) INFO: Initialize decoder.decoders.2.src_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,209 (initialize:88) INFO: Initialize decoder.decoders.2.feed_forward.w_1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,209 (initialize:88) INFO: Initialize decoder.decoders.2.feed_forward.w_2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,209 (initialize:88) INFO: Initialize decoder.decoders.2.norm1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,209 (initialize:88) INFO: Initialize decoder.decoders.2.norm2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,209 (initialize:88) INFO: Initialize decoder.decoders.2.norm3.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,209 (initialize:88) INFO: Initialize decoder.decoders.3.self_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,209 (initialize:88) INFO: Initialize decoder.decoders.3.self_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,209 (initialize:88) INFO: Initialize decoder.decoders.3.self_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,209 (initialize:88) INFO: Initialize decoder.decoders.3.self_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,209 (initialize:88) INFO: Initialize decoder.decoders.3.src_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,210 (initialize:88) INFO: Initialize decoder.decoders.3.src_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,210 (initialize:88) INFO: Initialize decoder.decoders.3.src_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,210 (initialize:88) INFO: Initialize decoder.decoders.3.src_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,210 (initialize:88) INFO: Initialize decoder.decoders.3.feed_forward.w_1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,210 (initialize:88) INFO: Initialize decoder.decoders.3.feed_forward.w_2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,210 (initialize:88) INFO: Initialize decoder.decoders.3.norm1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,210 (initialize:88) INFO: Initialize decoder.decoders.3.norm2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,210 (initialize:88) INFO: Initialize decoder.decoders.3.norm3.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,210 (initialize:88) INFO: Initialize decoder.decoders.4.self_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,210 (initialize:88) INFO: Initialize decoder.decoders.4.self_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,210 (initialize:88) INFO: Initialize decoder.decoders.4.self_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,210 (initialize:88) INFO: Initialize decoder.decoders.4.self_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,210 (initialize:88) INFO: Initialize decoder.decoders.4.src_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,210 (initialize:88) INFO: Initialize decoder.decoders.4.src_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,210 (initialize:88) INFO: Initialize decoder.decoders.4.src_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,210 (initialize:88) INFO: Initialize decoder.decoders.4.src_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,210 (initialize:88) INFO: Initialize decoder.decoders.4.feed_forward.w_1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,210 (initialize:88) INFO: Initialize decoder.decoders.4.feed_forward.w_2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,210 (initialize:88) INFO: Initialize decoder.decoders.4.norm1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,210 (initialize:88) INFO: Initialize decoder.decoders.4.norm2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,210 (initialize:88) INFO: Initialize decoder.decoders.4.norm3.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,210 (initialize:88) INFO: Initialize decoder.decoders.5.self_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,210 (initialize:88) INFO: Initialize decoder.decoders.5.self_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,210 (initialize:88) INFO: Initialize decoder.decoders.5.self_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,210 (initialize:88) INFO: Initialize decoder.decoders.5.self_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,210 (initialize:88) INFO: Initialize decoder.decoders.5.src_attn.linear_q.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,210 (initialize:88) INFO: Initialize decoder.decoders.5.src_attn.linear_k.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,210 (initialize:88) INFO: Initialize decoder.decoders.5.src_attn.linear_v.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,210 (initialize:88) INFO: Initialize decoder.decoders.5.src_attn.linear_out.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,210 (initialize:88) INFO: Initialize decoder.decoders.5.feed_forward.w_1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,210 (initialize:88) INFO: Initialize decoder.decoders.5.feed_forward.w_2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,210 (initialize:88) INFO: Initialize decoder.decoders.5.norm1.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,210 (initialize:88) INFO: Initialize decoder.decoders.5.norm2.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,211 (initialize:88) INFO: Initialize decoder.decoders.5.norm3.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:06,211 (initialize:88) INFO: Initialize ctc.ctc_lo.bias to zeros [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:08,721 (abs_task:1201) INFO: pytorch.version=1.13.1, cuda.available=True, cudnn.version=8500, cudnn.benchmark=False, cudnn.deterministic=True [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:08,726 (abs_task:1202) INFO: Model structure: ESPnetASRModel( (frontend): DefaultFrontend( (stft): Stft(n_fft=512, win_length=512, hop_length=128, center=True, normalized=False, onesided=True) (frontend): Frontend() (logmel): LogMel(sr=16000, n_fft=512, n_mels=80, fmin=0, fmax=8000.0, htk=False) ) (specaug): SpecAug( (time_warp): TimeWarp(window=5, mode=bicubic) (freq_mask): MaskAlongAxis(mask_width_range=[0, 27], num_mask=2, axis=freq) (time_mask): MaskAlongAxisVariableMaxWidth(mask_width_ratio_range=[0.0, 0.05], num_mask=10, axis=time) ) (normalize): GlobalMVN(stats_file=exp/asr_stats_raw_bpe2000_sp/train/feats_stats.npz, norm_means=True, norm_vars=True) (encoder): TransformerEncoder( (embed): Conv2dSubsampling( (conv): Sequential( (0): Conv2d(1, 256, kernel_size=(3, 3), stride=(2, 2)) (1): ReLU() (2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2)) (3): ReLU() ) (out): Sequential( (0): Linear(in_features=4864, out_features=256, bias=True) (1): PositionalEncoding( (dropout): Dropout(p=0.1, inplace=False) ) ) ) (encoders): MultiSequential( (0): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (feed_forward): PositionwiseFeedForward( (w_1): Linear(in_features=256, out_features=2048, bias=True) (w_2): Linear(in_features=2048, out_features=256, bias=True) (dropout): Dropout(p=0.1, inplace=False) (activation): ReLU() ) (norm1): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (feed_forward): PositionwiseFeedForward( (w_1): Linear(in_features=256, out_features=2048, bias=True) (w_2): Linear(in_features=2048, out_features=256, bias=True) (dropout): Dropout(p=0.1, inplace=False) (activation): ReLU() ) (norm1): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (2): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (feed_forward): PositionwiseFeedForward( (w_1): Linear(in_features=256, out_features=2048, bias=True) (w_2): Linear(in_features=2048, out_features=256, bias=True) (dropout): Dropout(p=0.1, inplace=False) (activation): ReLU() ) (norm1): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (3): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (feed_forward): PositionwiseFeedForward( (w_1): Linear(in_features=256, out_features=2048, bias=True) (w_2): Linear(in_features=2048, out_features=256, bias=True) (dropout): Dropout(p=0.1, inplace=False) (activation): ReLU() ) (norm1): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (4): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (feed_forward): PositionwiseFeedForward( (w_1): Linear(in_features=256, out_features=2048, bias=True) (w_2): Linear(in_features=2048, out_features=256, bias=True) (dropout): Dropout(p=0.1, inplace=False) (activation): ReLU() ) (norm1): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (5): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (feed_forward): PositionwiseFeedForward( (w_1): Linear(in_features=256, out_features=2048, bias=True) (w_2): Linear(in_features=2048, out_features=256, bias=True) (dropout): Dropout(p=0.1, inplace=False) (activation): ReLU() ) (norm1): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (6): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (feed_forward): PositionwiseFeedForward( (w_1): Linear(in_features=256, out_features=2048, bias=True) (w_2): Linear(in_features=2048, out_features=256, bias=True) (dropout): Dropout(p=0.1, inplace=False) (activation): ReLU() ) (norm1): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (7): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (feed_forward): PositionwiseFeedForward( (w_1): Linear(in_features=256, out_features=2048, bias=True) (w_2): Linear(in_features=2048, out_features=256, bias=True) (dropout): Dropout(p=0.1, inplace=False) (activation): ReLU() ) (norm1): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (8): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (feed_forward): PositionwiseFeedForward( (w_1): Linear(in_features=256, out_features=2048, bias=True) (w_2): Linear(in_features=2048, out_features=256, bias=True) (dropout): Dropout(p=0.1, inplace=False) (activation): ReLU() ) (norm1): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (9): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (feed_forward): PositionwiseFeedForward( (w_1): Linear(in_features=256, out_features=2048, bias=True) (w_2): Linear(in_features=2048, out_features=256, bias=True) (dropout): Dropout(p=0.1, inplace=False) (activation): ReLU() ) (norm1): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (10): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (feed_forward): PositionwiseFeedForward( (w_1): Linear(in_features=256, out_features=2048, bias=True) (w_2): Linear(in_features=2048, out_features=256, bias=True) (dropout): Dropout(p=0.1, inplace=False) (activation): ReLU() ) (norm1): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (11): EncoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (feed_forward): PositionwiseFeedForward( (w_1): Linear(in_features=256, out_features=2048, bias=True) (w_2): Linear(in_features=2048, out_features=256, bias=True) (dropout): Dropout(p=0.1, inplace=False) (activation): ReLU() ) (norm1): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (after_norm): LayerNorm((256,), eps=1e-12, elementwise_affine=True) ) (decoder): TransformerDecoder( (embed): Sequential( (0): Embedding(2000, 256) (1): PositionalEncoding( (dropout): Dropout(p=0.1, inplace=False) ) ) (after_norm): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (output_layer): Linear(in_features=256, out_features=2000, bias=True) (decoders): MultiSequential( (0): DecoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (src_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (feed_forward): PositionwiseFeedForward( (w_1): Linear(in_features=256, out_features=2048, bias=True) (w_2): Linear(in_features=2048, out_features=256, bias=True) (dropout): Dropout(p=0.1, inplace=False) (activation): ReLU() ) (norm1): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm3): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (1): DecoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (src_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (feed_forward): PositionwiseFeedForward( (w_1): Linear(in_features=256, out_features=2048, bias=True) (w_2): Linear(in_features=2048, out_features=256, bias=True) (dropout): Dropout(p=0.1, inplace=False) (activation): ReLU() ) (norm1): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm3): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (2): DecoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (src_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (feed_forward): PositionwiseFeedForward( (w_1): Linear(in_features=256, out_features=2048, bias=True) (w_2): Linear(in_features=2048, out_features=256, bias=True) (dropout): Dropout(p=0.1, inplace=False) (activation): ReLU() ) (norm1): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm3): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (3): DecoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (src_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (feed_forward): PositionwiseFeedForward( (w_1): Linear(in_features=256, out_features=2048, bias=True) (w_2): Linear(in_features=2048, out_features=256, bias=True) (dropout): Dropout(p=0.1, inplace=False) (activation): ReLU() ) (norm1): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm3): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (4): DecoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (src_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (feed_forward): PositionwiseFeedForward( (w_1): Linear(in_features=256, out_features=2048, bias=True) (w_2): Linear(in_features=2048, out_features=256, bias=True) (dropout): Dropout(p=0.1, inplace=False) (activation): ReLU() ) (norm1): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm3): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) (5): DecoderLayer( (self_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (src_attn): MultiHeadedAttention( (linear_q): Linear(in_features=256, out_features=256, bias=True) (linear_k): Linear(in_features=256, out_features=256, bias=True) (linear_v): Linear(in_features=256, out_features=256, bias=True) (linear_out): Linear(in_features=256, out_features=256, bias=True) (dropout): Dropout(p=0.0, inplace=False) ) (feed_forward): PositionwiseFeedForward( (w_1): Linear(in_features=256, out_features=2048, bias=True) (w_2): Linear(in_features=2048, out_features=256, bias=True) (dropout): Dropout(p=0.1, inplace=False) (activation): ReLU() ) (norm1): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm2): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (norm3): LayerNorm((256,), eps=1e-12, elementwise_affine=True) (dropout): Dropout(p=0.1, inplace=False) ) ) ) (criterion_att): LabelSmoothingLoss( (criterion): KLDivLoss() ) (ctc): CTC( (ctc_lo): Linear(in_features=256, out_features=2000, bias=True) (ctc_loss): CTCLoss() ) ) Model summary: Class Name: ESPnetASRModel Total Number of model parameters: 28.63 M Number of trainable parameters: 28.63 M (100.0%) Size: 114.53 MB Type: torch.float32 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:08,726 (abs_task:1205) INFO: Optimizer: Adam ( Parameter Group 0 amsgrad: False betas: (0.9, 0.999) capturable: False differentiable: False eps: 1e-08 foreach: None fused: False initial_lr: 0.0001 lr: 3.3333333333333334e-09 maximize: False weight_decay: 0 ) [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:08,726 (abs_task:1206) INFO: Scheduler: WarmupLR(warmup_steps=30000) [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:08,727 (abs_task:1215) INFO: Saving the configuration in exp/asr_train_raw_bpe2000_sp/config.yaml [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:10,068 (asr:471) INFO: Optional Data Names: ('text_spk2', 'text_spk3', 'text_spk4') [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:13,011 (abs_task:1570) INFO: [train] dataset: ESPnetDataset( speech: {"path": "dump/raw/train_sp/wav.scp", "type": "sound"} text: {"path": "dump/raw/train_sp/text", "type": "text"} preprocess: ) [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:13,011 (abs_task:1571) INFO: [train] Batch sampler: FoldedBatchSampler(N-batch=7161, batch_size=128, shape_files=['exp/asr_stats_raw_bpe2000_sp/train/speech_shape', 'exp/asr_stats_raw_bpe2000_sp/train/text_shape.bpe'], sort_in_batch=descending, sort_batch=descending) [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:13,012 (abs_task:1572) INFO: [train] mini-batch sizes summary: N-batch=7161, mean=61.7, min=12, max=128 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:13,069 (asr:471) INFO: Optional Data Names: ('text_spk2', 'text_spk3', 'text_spk4') [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:13,082 (abs_task:1570) INFO: [valid] dataset: ESPnetDataset( speech: {"path": "dump/raw/dev/wav.scp", "type": "sound"} text: {"path": "dump/raw/dev/text", "type": "text"} preprocess: ) [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:13,082 (abs_task:1571) INFO: [valid] Batch sampler: FoldedBatchSampler(N-batch=53, batch_size=128, shape_files=['exp/asr_stats_raw_bpe2000_sp/valid/speech_shape', 'exp/asr_stats_raw_bpe2000_sp/valid/text_shape.bpe'], sort_in_batch=descending, sort_batch=descending) [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:13,083 (abs_task:1572) INFO: [valid] mini-batch sizes summary: N-batch=53, mean=61.9, min=29, max=128 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:13,091 (asr:471) INFO: Optional Data Names: ('text_spk2', 'text_spk3', 'text_spk4') [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:13,116 (abs_task:1570) INFO: [plot_att] dataset: ESPnetDataset( speech: {"path": "dump/raw/dev/wav.scp", "type": "sound"} text: {"path": "dump/raw/dev/text", "type": "text"} preprocess: ) [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:13,116 (abs_task:1571) INFO: [plot_att] Batch sampler: UnsortedBatchSampler(N-batch=3283, batch_size=1, key_file=exp/asr_stats_raw_bpe2000_sp/valid/speech_shape, [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:13,117 (abs_task:1572) INFO: [plot_att] mini-batch sizes summary: N-batch=3, mean=1.0, min=1, max=1 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:14,208 (trainer:159) INFO: The training was resumed using exp/asr_train_raw_bpe2000_sp/checkpoint.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:29:14,479 (trainer:284) INFO: 3/100epoch started [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:30:54,223 (trainer:732) INFO: 3epoch:train:1-358batch: iter_time=0.008, forward_time=0.114, loss_ctc=156.031, loss_att=110.131, acc=0.127, loss=123.901, backward_time=0.051, grad_norm=30.828, clip=100.000, loss_scale=2.048e+03, optim_step_time=0.034, optim0_lr0=4.830e-05, train_time=0.278 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:32:15,753 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:32:30,495 (trainer:732) INFO: 3epoch:train:359-716batch: iter_time=0.010, forward_time=0.104, loss_ctc=145.899, loss_att=102.693, acc=0.130, loss=115.654, backward_time=0.052, grad_norm=29.318, clip=100.000, loss_scale=3.255e+03, optim_step_time=0.035, optim0_lr0=4.950e-05, train_time=0.269 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:34:05,520 (trainer:732) INFO: 3epoch:train:717-1074batch: iter_time=0.009, forward_time=0.102, loss_ctc=148.393, loss_att=104.127, acc=0.131, loss=117.407, backward_time=0.052, grad_norm=28.532, clip=100.000, loss_scale=4.096e+03, optim_step_time=0.034, optim0_lr0=5.069e-05, train_time=0.265 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:34:19,111 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:35:40,234 (trainer:732) INFO: 3epoch:train:1075-1432batch: iter_time=0.010, forward_time=0.102, loss_ctc=149.683, loss_att=104.715, acc=0.132, loss=118.205, backward_time=0.051, grad_norm=32.003, clip=100.000, loss_scale=2.335e+03, optim_step_time=0.034, optim0_lr0=5.188e-05, train_time=0.264 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:37:14,820 (trainer:732) INFO: 3epoch:train:1433-1790batch: iter_time=0.010, forward_time=0.101, loss_ctc=143.272, loss_att=99.991, acc=0.134, loss=112.976, backward_time=0.051, grad_norm=26.756, clip=100.000, loss_scale=2.048e+03, optim_step_time=0.034, optim0_lr0=5.307e-05, train_time=0.264 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:38:48,961 (trainer:732) INFO: 3epoch:train:1791-2148batch: iter_time=0.008, forward_time=0.101, loss_ctc=147.587, loss_att=102.671, acc=0.134, loss=116.146, backward_time=0.051, grad_norm=26.465, clip=100.000, loss_scale=2.048e+03, optim_step_time=0.034, optim0_lr0=5.427e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:40:18,735 (trainer:663) WARNING: The grad norm is nan. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:40:22,141 (trainer:732) INFO: 3epoch:train:2149-2506batch: iter_time=0.008, forward_time=0.100, loss_ctc=148.158, loss_att=102.682, acc=0.135, loss=116.324, backward_time=0.050, grad_norm=30.366, clip=100.000, loss_scale=2.011e+03, optim_step_time=0.034, optim0_lr0=5.546e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:41:55,162 (trainer:732) INFO: 3epoch:train:2507-2864batch: iter_time=0.008, forward_time=0.100, loss_ctc=143.582, loss_att=99.393, acc=0.135, loss=112.650, backward_time=0.050, grad_norm=30.824, clip=100.000, loss_scale=1.024e+03, optim_step_time=0.034, optim0_lr0=5.665e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:43:13,261 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:43:28,850 (trainer:732) INFO: 3epoch:train:2865-3222batch: iter_time=0.004, forward_time=0.103, loss_ctc=152.907, loss_att=105.322, acc=0.135, loss=119.597, backward_time=0.051, grad_norm=28.588, clip=100.000, loss_scale=1.024e+03, optim_step_time=0.034, optim0_lr0=5.784e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:45:05,954 (trainer:732) INFO: 3epoch:train:3223-3580batch: iter_time=0.017, forward_time=0.101, loss_ctc=143.084, loss_att=98.378, acc=0.139, loss=111.790, backward_time=0.054, grad_norm=26.598, clip=100.000, loss_scale=1.024e+03, optim_step_time=0.035, optim0_lr0=5.904e-05, train_time=0.271 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:46:44,413 (trainer:732) INFO: 3epoch:train:3581-3938batch: iter_time=0.009, forward_time=0.106, loss_ctc=155.185, loss_att=106.277, acc=0.137, loss=120.949, backward_time=0.054, grad_norm=31.713, clip=100.000, loss_scale=1.024e+03, optim_step_time=0.035, optim0_lr0=6.023e-05, train_time=0.275 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:47:46,152 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:48:17,463 (trainer:732) INFO: 3epoch:train:3939-4296batch: iter_time=0.006, forward_time=0.101, loss_ctc=144.154, loss_att=98.462, acc=0.141, loss=112.170, backward_time=0.052, grad_norm=29.994, clip=100.000, loss_scale=1.024e+03, optim_step_time=0.034, optim0_lr0=6.143e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:49:52,095 (trainer:732) INFO: 3epoch:train:4297-4654batch: iter_time=0.008, forward_time=0.102, loss_ctc=147.589, loss_att=100.633, acc=0.139, loss=114.720, backward_time=0.052, grad_norm=26.232, clip=100.000, loss_scale=1.485e+03, optim_step_time=0.034, optim0_lr0=6.262e-05, train_time=0.264 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:51:27,873 (trainer:732) INFO: 3epoch:train:4655-5012batch: iter_time=0.009, forward_time=0.103, loss_ctc=145.380, loss_att=98.953, acc=0.141, loss=112.881, backward_time=0.054, grad_norm=33.309, clip=100.000, loss_scale=2.048e+03, optim_step_time=0.035, optim0_lr0=6.381e-05, train_time=0.267 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:52:01,641 (trainer:663) WARNING: The grad norm is nan. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:53:04,118 (trainer:732) INFO: 3epoch:train:5013-5370batch: iter_time=0.007, forward_time=0.105, loss_ctc=153.924, loss_att=104.331, acc=0.139, loss=119.209, backward_time=0.053, grad_norm=37.066, clip=100.000, loss_scale=1.383e+03, optim_step_time=0.035, optim0_lr0=6.500e-05, train_time=0.269 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:54:39,396 (trainer:732) INFO: 3epoch:train:5371-5728batch: iter_time=0.006, forward_time=0.104, loss_ctc=152.394, loss_att=103.102, acc=0.139, loss=117.889, backward_time=0.052, grad_norm=35.824, clip=100.000, loss_scale=1.024e+03, optim_step_time=0.034, optim0_lr0=6.619e-05, train_time=0.266 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:56:14,393 (trainer:732) INFO: 3epoch:train:5729-6086batch: iter_time=0.003, forward_time=0.107, loss_ctc=152.631, loss_att=103.110, acc=0.141, loss=117.967, backward_time=0.052, grad_norm=37.394, clip=100.000, loss_scale=1.024e+03, optim_step_time=0.034, optim0_lr0=6.739e-05, train_time=0.265 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:57:52,070 (trainer:732) INFO: 3epoch:train:6087-6444batch: iter_time=0.007, forward_time=0.110, loss_ctc=147.020, loss_att=98.985, acc=0.144, loss=113.395, backward_time=0.053, grad_norm=32.594, clip=100.000, loss_scale=1.024e+03, optim_step_time=0.034, optim0_lr0=6.858e-05, train_time=0.273 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 18:59:26,435 (trainer:732) INFO: 3epoch:train:6445-6802batch: iter_time=0.004, forward_time=0.107, loss_ctc=144.844, loss_att=97.296, acc=0.145, loss=111.561, backward_time=0.053, grad_norm=31.525, clip=100.000, loss_scale=1.024e+03, optim_step_time=0.034, optim0_lr0=6.978e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:01:02,839 (trainer:732) INFO: 3epoch:train:6803-7160batch: iter_time=0.007, forward_time=0.108, loss_ctc=146.357, loss_att=98.242, acc=0.145, loss=112.677, backward_time=0.052, grad_norm=32.139, clip=100.000, loss_scale=1.087e+03, optim_step_time=0.034, optim0_lr0=7.097e-05, train_time=0.269 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:02:16,460 (trainer:338) INFO: 3epoch results: [train] iter_time=0.008, forward_time=0.104, loss_ctc=148.325, loss_att=101.921, acc=0.137, loss=115.842, backward_time=0.052, grad_norm=30.900, clip=100.000, loss_scale=1.653e+03, optim_step_time=0.034, optim0_lr0=5.964e-05, train_time=0.266, time=31 minutes and 48.97 seconds, total_count=21483, gpu_max_cached_mem_GB=25.057, [valid] loss_ctc=144.412, cer_ctc=0.948, loss_att=93.319, acc=0.167, cer=0.756, wer=1.000, loss=108.647, time=15.36 seconds, total_count=159, gpu_max_cached_mem_GB=28.451, [att_plot] time=57.61 seconds, total_count=0, gpu_max_cached_mem_GB=28.451 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:02:19,881 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:02:19,881 (trainer:272) INFO: 4/100epoch started. Estimated time to finish: 2 days, 5 hours and 29 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:03:53,413 (trainer:732) INFO: 4epoch:train:1-358batch: iter_time=0.004, forward_time=0.102, loss_ctc=144.896, loss_att=96.804, acc=0.147, loss=111.232, backward_time=0.056, grad_norm=28.517, clip=100.000, loss_scale=2.048e+03, optim_step_time=0.034, optim0_lr0=7.216e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:05:27,021 (trainer:732) INFO: 4epoch:train:359-716batch: iter_time=8.531e-04, forward_time=0.104, loss_ctc=152.359, loss_att=101.535, acc=0.145, loss=116.782, backward_time=0.057, grad_norm=30.125, clip=100.000, loss_scale=2.048e+03, optim_step_time=0.034, optim0_lr0=7.336e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:07:00,932 (trainer:732) INFO: 4epoch:train:717-1074batch: iter_time=7.812e-04, forward_time=0.104, loss_ctc=154.198, loss_att=102.357, acc=0.147, loss=117.909, backward_time=0.056, grad_norm=35.201, clip=100.000, loss_scale=2.048e+03, optim_step_time=0.034, optim0_lr0=7.455e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:08:35,233 (trainer:732) INFO: 4epoch:train:1075-1432batch: iter_time=0.002, forward_time=0.105, loss_ctc=148.072, loss_att=98.125, acc=0.150, loss=113.109, backward_time=0.056, grad_norm=32.345, clip=100.000, loss_scale=2.048e+03, optim_step_time=0.034, optim0_lr0=7.575e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:10:07,774 (trainer:732) INFO: 4epoch:train:1433-1790batch: iter_time=7.027e-04, forward_time=0.103, loss_ctc=146.874, loss_att=97.274, acc=0.149, loss=112.154, backward_time=0.055, grad_norm=32.642, clip=100.000, loss_scale=2.048e+03, optim_step_time=0.034, optim0_lr0=7.694e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:11:19,615 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:11:39,662 (trainer:663) WARNING: The grad norm is nan. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:11:42,661 (trainer:732) INFO: 4epoch:train:1791-2148batch: iter_time=0.004, forward_time=0.105, loss_ctc=140.599, loss_att=92.850, acc=0.153, loss=107.175, backward_time=0.057, grad_norm=32.877, clip=100.000, loss_scale=2.482e+03, optim_step_time=0.035, optim0_lr0=7.813e-05, train_time=0.265 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:12:07,419 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:13:16,678 (trainer:732) INFO: 4epoch:train:2149-2506batch: iter_time=0.002, forward_time=0.104, loss_ctc=147.306, loss_att=97.009, acc=0.153, loss=112.098, backward_time=0.056, grad_norm=36.311, clip=100.000, loss_scale=1.024e+03, optim_step_time=0.034, optim0_lr0=7.932e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:14:51,734 (trainer:732) INFO: 4epoch:train:2507-2864batch: iter_time=0.004, forward_time=0.105, loss_ctc=140.686, loss_att=92.464, acc=0.156, loss=106.931, backward_time=0.056, grad_norm=33.179, clip=100.000, loss_scale=1.024e+03, optim_step_time=0.034, optim0_lr0=8.051e-05, train_time=0.265 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:16:24,490 (trainer:732) INFO: 4epoch:train:2865-3222batch: iter_time=0.005, forward_time=0.101, loss_ctc=140.716, loss_att=92.446, acc=0.155, loss=106.927, backward_time=0.056, grad_norm=36.578, clip=100.000, loss_scale=1.024e+03, optim_step_time=0.034, optim0_lr0=8.170e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:17:35,076 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:18:00,295 (trainer:732) INFO: 4epoch:train:3223-3580batch: iter_time=0.004, forward_time=0.105, loss_ctc=153.415, loss_att=100.358, acc=0.154, loss=116.275, backward_time=0.057, grad_norm=42.134, clip=100.000, loss_scale=1.024e+03, optim_step_time=0.034, optim0_lr0=8.290e-05, train_time=0.267 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:19:35,288 (trainer:732) INFO: 4epoch:train:3581-3938batch: iter_time=0.006, forward_time=0.104, loss_ctc=142.573, loss_att=92.980, acc=0.159, loss=107.858, backward_time=0.056, grad_norm=38.428, clip=100.000, loss_scale=1.024e+03, optim_step_time=0.035, optim0_lr0=8.409e-05, train_time=0.265 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:21:11,310 (trainer:732) INFO: 4epoch:train:3939-4296batch: iter_time=0.004, forward_time=0.106, loss_ctc=150.561, loss_att=98.291, acc=0.156, loss=113.972, backward_time=0.056, grad_norm=36.591, clip=100.000, loss_scale=1.479e+03, optim_step_time=0.034, optim0_lr0=8.529e-05, train_time=0.268 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:22:47,909 (trainer:732) INFO: 4epoch:train:4297-4654batch: iter_time=0.006, forward_time=0.105, loss_ctc=156.678, loss_att=101.816, acc=0.155, loss=118.275, backward_time=0.057, grad_norm=32.063, clip=100.000, loss_scale=2.048e+03, optim_step_time=0.034, optim0_lr0=8.648e-05, train_time=0.270 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:24:00,996 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:24:24,915 (trainer:732) INFO: 4epoch:train:4655-5012batch: iter_time=0.009, forward_time=0.105, loss_ctc=145.061, loss_att=94.250, acc=0.159, loss=109.493, backward_time=0.056, grad_norm=37.118, clip=100.000, loss_scale=1.793e+03, optim_step_time=0.034, optim0_lr0=8.767e-05, train_time=0.271 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:25:58,543 (trainer:732) INFO: 4epoch:train:5013-5370batch: iter_time=0.005, forward_time=0.102, loss_ctc=145.306, loss_att=94.091, acc=0.161, loss=109.455, backward_time=0.057, grad_norm=37.097, clip=100.000, loss_scale=1.024e+03, optim_step_time=0.034, optim0_lr0=8.886e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:26:32,213 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:26:55,717 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:27:34,582 (trainer:732) INFO: 4epoch:train:5371-5728batch: iter_time=0.006, forward_time=0.105, loss_ctc=150.867, loss_att=97.525, acc=0.160, loss=113.528, backward_time=0.057, grad_norm=36.027, clip=100.000, loss_scale=689.838, optim_step_time=0.034, optim0_lr0=9.005e-05, train_time=0.268 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:29:10,650 (trainer:732) INFO: 4epoch:train:5729-6086batch: iter_time=0.007, forward_time=0.104, loss_ctc=148.138, loss_att=95.551, acc=0.163, loss=111.327, backward_time=0.058, grad_norm=35.139, clip=100.000, loss_scale=512.000, optim_step_time=0.034, optim0_lr0=9.124e-05, train_time=0.268 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:30:47,746 (trainer:732) INFO: 4epoch:train:6087-6444batch: iter_time=0.006, forward_time=0.106, loss_ctc=153.505, loss_att=98.854, acc=0.162, loss=115.250, backward_time=0.057, grad_norm=47.676, clip=100.000, loss_scale=512.000, optim_step_time=0.035, optim0_lr0=9.244e-05, train_time=0.271 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:32:24,019 (trainer:732) INFO: 4epoch:train:6445-6802batch: iter_time=0.005, forward_time=0.106, loss_ctc=152.109, loss_att=97.649, acc=0.165, loss=113.987, backward_time=0.058, grad_norm=34.720, clip=100.000, loss_scale=512.000, optim_step_time=0.034, optim0_lr0=9.363e-05, train_time=0.269 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:34:01,047 (trainer:732) INFO: 4epoch:train:6803-7160batch: iter_time=0.010, forward_time=0.104, loss_ctc=152.309, loss_att=97.382, acc=0.165, loss=113.860, backward_time=0.057, grad_norm=46.544, clip=100.000, loss_scale=512.000, optim_step_time=0.034, optim0_lr0=9.483e-05, train_time=0.271 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:35:12,768 (trainer:338) INFO: 4epoch results: [train] iter_time=0.005, forward_time=0.104, loss_ctc=148.207, loss_att=96.914, acc=0.156, loss=112.302, backward_time=0.057, grad_norm=36.072, clip=100.000, loss_scale=1.346e+03, optim_step_time=0.034, optim0_lr0=8.350e-05, train_time=0.265, time=31 minutes and 41.91 seconds, total_count=28644, gpu_max_cached_mem_GB=28.451, [valid] loss_ctc=144.917, cer_ctc=0.940, loss_att=89.657, acc=0.183, cer=0.719, wer=1.000, loss=106.235, time=14.83 seconds, total_count=212, gpu_max_cached_mem_GB=28.451, [att_plot] time=56.13 seconds, total_count=0, gpu_max_cached_mem_GB=28.451 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:35:16,398 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:35:16,400 (trainer:272) INFO: 5/100epoch started. Estimated time to finish: 2 days, 4 hours and 49 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:35:29,509 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:35:41,485 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:36:50,178 (trainer:732) INFO: 5epoch:train:1-358batch: iter_time=0.004, forward_time=0.103, loss_ctc=142.984, loss_att=91.113, acc=0.170, loss=106.674, backward_time=0.057, grad_norm=68.807, clip=100.000, loss_scale=288.986, optim_step_time=0.035, optim0_lr0=9.602e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:38:24,169 (trainer:732) INFO: 5epoch:train:359-716batch: iter_time=0.002, forward_time=0.104, loss_ctc=150.110, loss_att=95.380, acc=0.171, loss=111.799, backward_time=0.056, grad_norm=48.377, clip=100.000, loss_scale=256.000, optim_step_time=0.035, optim0_lr0=9.721e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:39:49,603 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:39:57,599 (trainer:732) INFO: 5epoch:train:717-1074batch: iter_time=0.001, forward_time=0.103, loss_ctc=151.937, loss_att=96.689, acc=0.168, loss=113.263, backward_time=0.056, grad_norm=51.227, clip=100.000, loss_scale=244.885, optim_step_time=0.034, optim0_lr0=9.840e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:41:29,923 (trainer:732) INFO: 5epoch:train:1075-1432batch: iter_time=0.003, forward_time=0.101, loss_ctc=143.170, loss_att=91.195, acc=0.170, loss=106.788, backward_time=0.054, grad_norm=88.170, clip=100.000, loss_scale=128.000, optim_step_time=0.033, optim0_lr0=9.957e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:42:04,149 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:43:02,029 (trainer:732) INFO: 5epoch:train:1433-1790batch: iter_time=0.002, forward_time=0.101, loss_ctc=145.084, loss_att=92.149, acc=0.172, loss=108.030, backward_time=0.054, grad_norm=87.083, clip=100.000, loss_scale=87.485, optim_step_time=0.034, optim0_lr0=9.961e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:44:34,742 (trainer:732) INFO: 5epoch:train:1791-2148batch: iter_time=0.004, forward_time=0.100, loss_ctc=149.782, loss_att=94.961, acc=0.172, loss=111.408, backward_time=0.054, grad_norm=60.085, clip=100.000, loss_scale=64.000, optim_step_time=0.033, optim0_lr0=9.903e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:46:07,353 (trainer:732) INFO: 5epoch:train:2149-2506batch: iter_time=0.003, forward_time=0.101, loss_ctc=148.004, loss_att=93.756, acc=0.173, loss=110.030, backward_time=0.054, grad_norm=47.541, clip=100.000, loss_scale=64.000, optim_step_time=0.034, optim0_lr0=9.845e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:47:39,885 (trainer:732) INFO: 5epoch:train:2507-2864batch: iter_time=0.003, forward_time=0.101, loss_ctc=148.514, loss_att=93.817, acc=0.175, loss=110.226, backward_time=0.054, grad_norm=37.771, clip=100.000, loss_scale=64.000, optim_step_time=0.034, optim0_lr0=9.789e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:49:12,606 (trainer:732) INFO: 5epoch:train:2865-3222batch: iter_time=0.003, forward_time=0.101, loss_ctc=154.154, loss_att=96.808, acc=0.176, loss=114.012, backward_time=0.055, grad_norm=47.960, clip=100.000, loss_scale=64.000, optim_step_time=0.034, optim0_lr0=9.733e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:50:45,453 (trainer:732) INFO: 5epoch:train:3223-3580batch: iter_time=0.006, forward_time=0.099, loss_ctc=150.313, loss_att=94.563, acc=0.176, loss=111.288, backward_time=0.057, grad_norm=41.370, clip=100.000, loss_scale=66.860, optim_step_time=0.033, optim0_lr0=9.679e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:51:08,130 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:52:13,932 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:52:20,348 (trainer:732) INFO: 5epoch:train:3581-3938batch: iter_time=0.007, forward_time=0.101, loss_ctc=146.831, loss_att=92.541, acc=0.177, loss=108.828, backward_time=0.055, grad_norm=56.426, clip=100.000, loss_scale=128.000, optim_step_time=0.034, optim0_lr0=9.625e-05, train_time=0.265 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:53:52,640 (trainer:732) INFO: 5epoch:train:3939-4296batch: iter_time=0.004, forward_time=0.100, loss_ctc=145.874, loss_att=91.716, acc=0.179, loss=107.964, backward_time=0.054, grad_norm=53.008, clip=100.000, loss_scale=128.000, optim_step_time=0.034, optim0_lr0=9.572e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:55:26,613 (trainer:732) INFO: 5epoch:train:4297-4654batch: iter_time=0.006, forward_time=0.101, loss_ctc=147.356, loss_att=92.345, acc=0.179, loss=108.848, backward_time=0.054, grad_norm=102.778, clip=100.000, loss_scale=128.000, optim_step_time=0.034, optim0_lr0=9.520e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:56:59,473 (trainer:732) INFO: 5epoch:train:4655-5012batch: iter_time=0.003, forward_time=0.101, loss_ctc=147.058, loss_att=91.669, acc=0.182, loss=108.285, backward_time=0.056, grad_norm=102.721, clip=100.000, loss_scale=128.000, optim_step_time=0.035, optim0_lr0=9.469e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:57:08,169 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 19:58:36,281 (trainer:732) INFO: 5epoch:train:5013-5370batch: iter_time=0.009, forward_time=0.104, loss_ctc=150.234, loss_att=93.622, acc=0.182, loss=110.605, backward_time=0.058, grad_norm=180.661, clip=100.000, loss_scale=69.737, optim_step_time=0.035, optim0_lr0=9.419e-05, train_time=0.270 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:00:11,699 (trainer:732) INFO: 5epoch:train:5371-5728batch: iter_time=0.009, forward_time=0.102, loss_ctc=145.327, loss_att=90.750, acc=0.183, loss=107.123, backward_time=0.057, grad_norm=88.472, clip=100.000, loss_scale=64.000, optim_step_time=0.034, optim0_lr0=9.370e-05, train_time=0.266 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:01:47,407 (trainer:732) INFO: 5epoch:train:5729-6086batch: iter_time=0.009, forward_time=0.102, loss_ctc=151.529, loss_att=94.270, acc=0.183, loss=111.448, backward_time=0.055, grad_norm=62.246, clip=100.000, loss_scale=64.000, optim_step_time=0.034, optim0_lr0=9.321e-05, train_time=0.267 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:03:22,410 (trainer:732) INFO: 5epoch:train:6087-6444batch: iter_time=0.008, forward_time=0.102, loss_ctc=148.187, loss_att=91.947, acc=0.184, loss=108.819, backward_time=0.055, grad_norm=61.078, clip=100.000, loss_scale=64.000, optim_step_time=0.034, optim0_lr0=9.273e-05, train_time=0.265 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:04:58,370 (trainer:732) INFO: 5epoch:train:6445-6802batch: iter_time=0.009, forward_time=0.102, loss_ctc=152.446, loss_att=94.908, acc=0.183, loss=112.170, backward_time=0.057, grad_norm=73.320, clip=100.000, loss_scale=64.000, optim_step_time=0.035, optim0_lr0=9.226e-05, train_time=0.268 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:06:34,635 (trainer:732) INFO: 5epoch:train:6803-7160batch: iter_time=0.010, forward_time=0.103, loss_ctc=146.199, loss_att=90.631, acc=0.186, loss=107.301, backward_time=0.055, grad_norm=54.614, clip=100.000, loss_scale=84.559, optim_step_time=0.034, optim0_lr0=9.179e-05, train_time=0.269 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:07:44,159 (trainer:338) INFO: 5epoch results: [train] iter_time=0.005, forward_time=0.101, loss_ctc=148.208, loss_att=93.216, acc=0.177, loss=109.713, backward_time=0.055, grad_norm=70.666, clip=100.000, loss_scale=112.494, optim_step_time=0.034, optim0_lr0=9.600e-05, train_time=0.262, time=31 minutes and 18.97 seconds, total_count=35805, gpu_max_cached_mem_GB=28.451, [valid] loss_ctc=144.160, cer_ctc=0.946, loss_att=86.125, acc=0.210, cer=0.705, wer=1.000, loss=103.536, time=14.82 seconds, total_count=265, gpu_max_cached_mem_GB=28.451, [att_plot] time=53.97 seconds, total_count=0, gpu_max_cached_mem_GB=28.451 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:07:47,763 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:07:47,765 (trainer:272) INFO: 6/100epoch started. Estimated time to finish: 2 days, 4 hours and 54.04 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:09:21,443 (trainer:732) INFO: 6epoch:train:1-358batch: iter_time=0.004, forward_time=0.102, loss_ctc=147.502, loss_att=91.092, acc=0.187, loss=108.015, backward_time=0.055, grad_norm=43.349, clip=100.000, loss_scale=128.000, optim_step_time=0.034, optim0_lr0=9.133e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:10:53,516 (trainer:732) INFO: 6epoch:train:359-716batch: iter_time=0.001, forward_time=0.102, loss_ctc=148.361, loss_att=91.262, acc=0.191, loss=108.392, backward_time=0.054, grad_norm=45.637, clip=100.000, loss_scale=128.000, optim_step_time=0.033, optim0_lr0=9.088e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:11:27,494 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:12:26,600 (trainer:732) INFO: 6epoch:train:717-1074batch: iter_time=0.002, forward_time=0.103, loss_ctc=146.728, loss_att=90.412, acc=0.189, loss=107.307, backward_time=0.055, grad_norm=43.840, clip=100.000, loss_scale=128.000, optim_step_time=0.034, optim0_lr0=9.044e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:13:58,309 (trainer:732) INFO: 6epoch:train:1075-1432batch: iter_time=0.001, forward_time=0.101, loss_ctc=146.548, loss_att=89.955, acc=0.193, loss=106.933, backward_time=0.055, grad_norm=59.948, clip=100.000, loss_scale=128.000, optim_step_time=0.034, optim0_lr0=9.000e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:14:48,610 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:15:31,973 (trainer:732) INFO: 6epoch:train:1433-1790batch: iter_time=0.001, forward_time=0.104, loss_ctc=155.177, loss_att=94.981, acc=0.189, loss=113.040, backward_time=0.054, grad_norm=92.064, clip=100.000, loss_scale=128.000, optim_step_time=0.033, optim0_lr0=8.957e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:17:05,883 (trainer:732) INFO: 6epoch:train:1791-2148batch: iter_time=0.003, forward_time=0.105, loss_ctc=143.078, loss_att=87.841, acc=0.194, loss=104.412, backward_time=0.054, grad_norm=59.307, clip=100.000, loss_scale=222.391, optim_step_time=0.033, optim0_lr0=8.914e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:18:38,365 (trainer:732) INFO: 6epoch:train:2149-2506batch: iter_time=0.003, forward_time=0.103, loss_ctc=147.164, loss_att=90.115, acc=0.192, loss=107.230, backward_time=0.054, grad_norm=41.064, clip=100.000, loss_scale=256.000, optim_step_time=0.034, optim0_lr0=8.872e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:19:05,288 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:20:12,049 (trainer:732) INFO: 6epoch:train:2507-2864batch: iter_time=0.006, forward_time=0.102, loss_ctc=141.037, loss_att=86.763, acc=0.192, loss=103.045, backward_time=0.055, grad_norm=49.165, clip=100.000, loss_scale=256.000, optim_step_time=0.034, optim0_lr0=8.831e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:21:46,791 (trainer:732) INFO: 6epoch:train:2865-3222batch: iter_time=0.005, forward_time=0.104, loss_ctc=147.564, loss_att=90.177, acc=0.194, loss=107.393, backward_time=0.054, grad_norm=42.714, clip=100.000, loss_scale=256.000, optim_step_time=0.034, optim0_lr0=8.790e-05, train_time=0.264 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:23:21,495 (trainer:732) INFO: 6epoch:train:3223-3580batch: iter_time=0.004, forward_time=0.104, loss_ctc=149.844, loss_att=91.687, acc=0.194, loss=109.134, backward_time=0.055, grad_norm=42.791, clip=100.000, loss_scale=256.000, optim_step_time=0.034, optim0_lr0=8.750e-05, train_time=0.264 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:24:55,158 (trainer:732) INFO: 6epoch:train:3581-3938batch: iter_time=0.003, forward_time=0.103, loss_ctc=150.413, loss_att=91.580, acc=0.197, loss=109.230, backward_time=0.055, grad_norm=32.792, clip=100.000, loss_scale=294.615, optim_step_time=0.034, optim0_lr0=8.710e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:26:29,599 (trainer:732) INFO: 6epoch:train:3939-4296batch: iter_time=0.005, forward_time=0.103, loss_ctc=149.409, loss_att=91.316, acc=0.195, loss=108.744, backward_time=0.055, grad_norm=36.883, clip=100.000, loss_scale=512.000, optim_step_time=0.034, optim0_lr0=8.671e-05, train_time=0.264 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:28:03,420 (trainer:732) INFO: 6epoch:train:4297-4654batch: iter_time=0.005, forward_time=0.103, loss_ctc=146.659, loss_att=89.182, acc=0.199, loss=106.425, backward_time=0.055, grad_norm=47.081, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=8.632e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:29:37,727 (trainer:732) INFO: 6epoch:train:4655-5012batch: iter_time=0.004, forward_time=0.103, loss_ctc=150.447, loss_att=91.343, acc=0.197, loss=109.074, backward_time=0.057, grad_norm=49.150, clip=100.000, loss_scale=512.000, optim_step_time=0.034, optim0_lr0=8.594e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:31:14,645 (trainer:732) INFO: 6epoch:train:5013-5370batch: iter_time=0.008, forward_time=0.105, loss_ctc=145.110, loss_att=88.254, acc=0.199, loss=105.311, backward_time=0.058, grad_norm=39.308, clip=100.000, loss_scale=512.000, optim_step_time=0.035, optim0_lr0=8.557e-05, train_time=0.270 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:32:51,444 (trainer:732) INFO: 6epoch:train:5371-5728batch: iter_time=0.009, forward_time=0.105, loss_ctc=146.870, loss_att=89.269, acc=0.198, loss=106.549, backward_time=0.056, grad_norm=49.877, clip=100.000, loss_scale=512.000, optim_step_time=0.034, optim0_lr0=8.519e-05, train_time=0.270 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:34:26,039 (trainer:732) INFO: 6epoch:train:5729-6086batch: iter_time=0.004, forward_time=0.104, loss_ctc=152.738, loss_att=92.826, acc=0.197, loss=110.799, backward_time=0.056, grad_norm=41.482, clip=100.000, loss_scale=800.894, optim_step_time=0.035, optim0_lr0=8.483e-05, train_time=0.264 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:36:00,774 (trainer:732) INFO: 6epoch:train:6087-6444batch: iter_time=0.006, forward_time=0.104, loss_ctc=150.132, loss_att=90.748, acc=0.201, loss=108.563, backward_time=0.056, grad_norm=45.241, clip=100.000, loss_scale=1.024e+03, optim_step_time=0.034, optim0_lr0=8.447e-05, train_time=0.264 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:36:53,921 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:37:36,096 (trainer:732) INFO: 6epoch:train:6445-6802batch: iter_time=0.007, forward_time=0.104, loss_ctc=148.500, loss_att=89.712, acc=0.202, loss=107.349, backward_time=0.057, grad_norm=50.988, clip=100.000, loss_scale=797.401, optim_step_time=0.034, optim0_lr0=8.411e-05, train_time=0.266 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:38:03,556 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:39:11,003 (trainer:732) INFO: 6epoch:train:6803-7160batch: iter_time=0.008, forward_time=0.103, loss_ctc=148.141, loss_att=89.570, acc=0.202, loss=107.141, backward_time=0.055, grad_norm=74.685, clip=100.000, loss_scale=329.860, optim_step_time=0.034, optim0_lr0=8.376e-05, train_time=0.265 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:40:21,870 (trainer:338) INFO: 6epoch results: [train] iter_time=0.004, forward_time=0.103, loss_ctc=148.001, loss_att=90.365, acc=0.195, loss=107.655, backward_time=0.055, grad_norm=49.360, clip=100.000, loss_scale=384.590, optim_step_time=0.034, optim0_lr0=8.739e-05, train_time=0.263, time=31 minutes and 23.92 seconds, total_count=42966, gpu_max_cached_mem_GB=28.451, [valid] loss_ctc=144.396, cer_ctc=0.948, loss_att=83.668, acc=0.228, cer=0.704, wer=1.000, loss=101.887, time=14.83 seconds, total_count=318, gpu_max_cached_mem_GB=28.451, [att_plot] time=55.35 seconds, total_count=0, gpu_max_cached_mem_GB=28.451 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:40:25,360 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:40:25,361 (trainer:272) INFO: 7/100epoch started. Estimated time to finish: 2 days, 3 hours and 22 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:42:01,416 (trainer:732) INFO: 7epoch:train:1-358batch: iter_time=0.002, forward_time=0.109, loss_ctc=148.552, loss_att=89.572, acc=0.203, loss=107.266, backward_time=0.055, grad_norm=55.029, clip=100.000, loss_scale=256.000, optim_step_time=0.034, optim0_lr0=8.341e-05, train_time=0.268 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:43:00,333 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:43:36,317 (trainer:732) INFO: 7epoch:train:359-716batch: iter_time=0.001, forward_time=0.108, loss_ctc=147.222, loss_att=88.734, acc=0.202, loss=106.280, backward_time=0.055, grad_norm=75.572, clip=100.000, loss_scale=256.000, optim_step_time=0.034, optim0_lr0=8.306e-05, train_time=0.265 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:45:11,079 (trainer:732) INFO: 7epoch:train:717-1074batch: iter_time=9.221e-04, forward_time=0.108, loss_ctc=151.686, loss_att=90.801, acc=0.205, loss=109.066, backward_time=0.055, grad_norm=63.458, clip=100.000, loss_scale=256.000, optim_step_time=0.034, optim0_lr0=8.272e-05, train_time=0.264 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:45:44,564 (trainer:663) WARNING: The grad norm is nan. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:46:36,584 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:46:44,408 (trainer:732) INFO: 7epoch:train:1075-1432batch: iter_time=5.548e-04, forward_time=0.107, loss_ctc=146.132, loss_att=87.645, acc=0.205, loss=105.191, backward_time=0.055, grad_norm=108.370, clip=100.000, loss_scale=169.348, optim_step_time=0.034, optim0_lr0=8.239e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:48:19,576 (trainer:732) INFO: 7epoch:train:1433-1790batch: iter_time=7.021e-04, forward_time=0.109, loss_ctc=144.562, loss_att=86.754, acc=0.206, loss=104.097, backward_time=0.057, grad_norm=111.785, clip=100.000, loss_scale=64.000, optim_step_time=0.034, optim0_lr0=8.206e-05, train_time=0.266 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:49:54,395 (trainer:732) INFO: 7epoch:train:1791-2148batch: iter_time=8.052e-04, forward_time=0.108, loss_ctc=151.512, loss_att=90.994, acc=0.204, loss=109.150, backward_time=0.055, grad_norm=127.092, clip=100.000, loss_scale=64.000, optim_step_time=0.034, optim0_lr0=8.173e-05, train_time=0.265 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:51:30,122 (trainer:732) INFO: 7epoch:train:2149-2506batch: iter_time=0.001, forward_time=0.109, loss_ctc=150.210, loss_att=90.063, acc=0.205, loss=108.107, backward_time=0.057, grad_norm=131.055, clip=100.000, loss_scale=64.000, optim_step_time=0.034, optim0_lr0=8.141e-05, train_time=0.267 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:53:04,426 (trainer:732) INFO: 7epoch:train:2507-2864batch: iter_time=8.970e-04, forward_time=0.107, loss_ctc=149.388, loss_att=89.406, acc=0.206, loss=107.401, backward_time=0.056, grad_norm=156.732, clip=100.000, loss_scale=64.000, optim_step_time=0.034, optim0_lr0=8.109e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:54:42,453 (trainer:732) INFO: 7epoch:train:2865-3222batch: iter_time=0.007, forward_time=0.109, loss_ctc=145.563, loss_att=86.917, acc=0.209, loss=104.511, backward_time=0.058, grad_norm=123.527, clip=100.000, loss_scale=64.000, optim_step_time=0.034, optim0_lr0=8.077e-05, train_time=0.274 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:56:19,314 (trainer:732) INFO: 7epoch:train:3223-3580batch: iter_time=0.007, forward_time=0.108, loss_ctc=144.375, loss_att=86.291, acc=0.209, loss=103.716, backward_time=0.057, grad_norm=148.884, clip=100.000, loss_scale=95.821, optim_step_time=0.034, optim0_lr0=8.046e-05, train_time=0.270 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:56:46,410 (trainer:663) WARNING: The grad norm is nan. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:57:34,035 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:57:55,423 (trainer:732) INFO: 7epoch:train:3581-3938batch: iter_time=0.003, forward_time=0.108, loss_ctc=153.297, loss_att=91.630, acc=0.206, loss=110.130, backward_time=0.055, grad_norm=179.297, clip=100.000, loss_scale=81.748, optim_step_time=0.034, optim0_lr0=8.015e-05, train_time=0.268 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:58:42,930 (trainer:663) WARNING: The grad norm is nan. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 20:59:30,035 (trainer:732) INFO: 7epoch:train:3939-4296batch: iter_time=0.003, forward_time=0.106, loss_ctc=149.625, loss_att=89.436, acc=0.207, loss=107.492, backward_time=0.054, grad_norm=166.006, clip=100.000, loss_scale=47.866, optim_step_time=0.034, optim0_lr0=7.985e-05, train_time=0.264 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:01:04,447 (trainer:732) INFO: 7epoch:train:4297-4654batch: iter_time=0.004, forward_time=0.105, loss_ctc=148.196, loss_att=88.537, acc=0.208, loss=106.435, backward_time=0.055, grad_norm=99.704, clip=100.000, loss_scale=32.000, optim_step_time=0.034, optim0_lr0=7.954e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:02:39,513 (trainer:732) INFO: 7epoch:train:4655-5012batch: iter_time=0.006, forward_time=0.106, loss_ctc=144.255, loss_att=86.210, acc=0.210, loss=103.623, backward_time=0.056, grad_norm=154.260, clip=100.000, loss_scale=32.000, optim_step_time=0.034, optim0_lr0=7.925e-05, train_time=0.265 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:04:14,757 (trainer:732) INFO: 7epoch:train:5013-5370batch: iter_time=0.005, forward_time=0.106, loss_ctc=145.702, loss_att=86.802, acc=0.210, loss=104.472, backward_time=0.056, grad_norm=132.981, clip=100.000, loss_scale=32.000, optim_step_time=0.034, optim0_lr0=7.895e-05, train_time=0.266 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:05:48,995 (trainer:732) INFO: 7epoch:train:5371-5728batch: iter_time=0.005, forward_time=0.105, loss_ctc=145.788, loss_att=87.245, acc=0.208, loss=104.808, backward_time=0.056, grad_norm=109.049, clip=100.000, loss_scale=32.000, optim_step_time=0.034, optim0_lr0=7.866e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:07:24,796 (trainer:732) INFO: 7epoch:train:5729-6086batch: iter_time=0.005, forward_time=0.107, loss_ctc=154.922, loss_att=92.209, acc=0.209, loss=111.023, backward_time=0.056, grad_norm=153.109, clip=100.000, loss_scale=32.000, optim_step_time=0.034, optim0_lr0=7.837e-05, train_time=0.267 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:09:02,813 (trainer:732) INFO: 7epoch:train:6087-6444batch: iter_time=0.010, forward_time=0.108, loss_ctc=142.090, loss_att=84.725, acc=0.212, loss=101.934, backward_time=0.057, grad_norm=128.874, clip=100.000, loss_scale=61.318, optim_step_time=0.034, optim0_lr0=7.808e-05, train_time=0.274 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:10:38,315 (trainer:732) INFO: 7epoch:train:6445-6802batch: iter_time=0.005, forward_time=0.106, loss_ctc=152.804, loss_att=90.706, acc=0.211, loss=109.335, backward_time=0.056, grad_norm=76.225, clip=100.000, loss_scale=64.000, optim_step_time=0.034, optim0_lr0=7.780e-05, train_time=0.267 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:10:43,565 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:12:15,452 (trainer:732) INFO: 7epoch:train:6803-7160batch: iter_time=0.009, forward_time=0.107, loss_ctc=144.807, loss_att=85.932, acc=0.211, loss=103.594, backward_time=0.056, grad_norm=75.344, clip=100.000, loss_scale=64.000, optim_step_time=0.034, optim0_lr0=7.752e-05, train_time=0.271 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:13:25,499 (trainer:338) INFO: 7epoch results: [train] iter_time=0.004, forward_time=0.107, loss_ctc=147.966, loss_att=88.490, acc=0.207, loss=106.333, backward_time=0.056, grad_norm=118.795, clip=100.000, loss_scale=91.587, optim_step_time=0.034, optim0_lr0=8.036e-05, train_time=0.266, time=31 minutes and 50.77 seconds, total_count=50127, gpu_max_cached_mem_GB=28.451, [valid] loss_ctc=145.370, cer_ctc=0.948, loss_att=82.204, acc=0.240, cer=0.702, wer=1.000, loss=101.154, time=15.43 seconds, total_count=371, gpu_max_cached_mem_GB=28.451, [att_plot] time=53.93 seconds, total_count=0, gpu_max_cached_mem_GB=28.451 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:13:29,321 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:13:29,323 (trainer:272) INFO: 8/100epoch started. Estimated time to finish: 2 days, 2 hours and 55 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:15:03,654 (trainer:732) INFO: 8epoch:train:1-358batch: iter_time=0.003, forward_time=0.104, loss_ctc=150.215, loss_att=88.779, acc=0.213, loss=107.210, backward_time=0.056, grad_norm=68.867, clip=100.000, loss_scale=64.000, optim_step_time=0.034, optim0_lr0=7.724e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:16:37,659 (trainer:732) INFO: 8epoch:train:359-716batch: iter_time=0.002, forward_time=0.105, loss_ctc=150.270, loss_att=88.612, acc=0.215, loss=107.109, backward_time=0.056, grad_norm=74.988, clip=100.000, loss_scale=64.000, optim_step_time=0.034, optim0_lr0=7.697e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:18:11,011 (trainer:732) INFO: 8epoch:train:717-1074batch: iter_time=0.001, forward_time=0.104, loss_ctc=147.237, loss_att=87.356, acc=0.212, loss=105.320, backward_time=0.056, grad_norm=101.280, clip=100.000, loss_scale=85.274, optim_step_time=0.034, optim0_lr0=7.670e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:19:46,177 (trainer:732) INFO: 8epoch:train:1075-1432batch: iter_time=0.004, forward_time=0.105, loss_ctc=148.825, loss_att=87.783, acc=0.217, loss=106.096, backward_time=0.056, grad_norm=100.251, clip=100.000, loss_scale=128.000, optim_step_time=0.034, optim0_lr0=7.643e-05, train_time=0.266 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:21:18,505 (trainer:732) INFO: 8epoch:train:1433-1790batch: iter_time=0.002, forward_time=0.103, loss_ctc=146.862, loss_att=86.790, acc=0.216, loss=104.811, backward_time=0.055, grad_norm=93.128, clip=100.000, loss_scale=128.000, optim_step_time=0.034, optim0_lr0=7.617e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:21:50,915 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:22:51,664 (trainer:732) INFO: 8epoch:train:1791-2148batch: iter_time=0.005, forward_time=0.102, loss_ctc=146.547, loss_att=86.651, acc=0.214, loss=104.620, backward_time=0.055, grad_norm=92.041, clip=100.000, loss_scale=86.230, optim_step_time=0.034, optim0_lr0=7.591e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:24:24,976 (trainer:732) INFO: 8epoch:train:2149-2506batch: iter_time=0.002, forward_time=0.104, loss_ctc=154.861, loss_att=91.067, acc=0.215, loss=110.205, backward_time=0.054, grad_norm=113.232, clip=100.000, loss_scale=64.000, optim_step_time=0.033, optim0_lr0=7.565e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:26:01,479 (trainer:732) INFO: 8epoch:train:2507-2864batch: iter_time=0.008, forward_time=0.105, loss_ctc=146.907, loss_att=86.254, acc=0.216, loss=104.450, backward_time=0.056, grad_norm=120.242, clip=100.000, loss_scale=64.000, optim_step_time=0.034, optim0_lr0=7.539e-05, train_time=0.269 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:27:37,720 (trainer:732) INFO: 8epoch:train:2865-3222batch: iter_time=0.004, forward_time=0.106, loss_ctc=151.471, loss_att=89.169, acc=0.215, loss=107.860, backward_time=0.058, grad_norm=126.411, clip=100.000, loss_scale=64.000, optim_step_time=0.034, optim0_lr0=7.513e-05, train_time=0.269 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:29:12,588 (trainer:732) INFO: 8epoch:train:3223-3580batch: iter_time=0.005, forward_time=0.104, loss_ctc=146.911, loss_att=85.917, acc=0.221, loss=104.215, backward_time=0.055, grad_norm=156.307, clip=100.000, loss_scale=64.000, optim_step_time=0.034, optim0_lr0=7.488e-05, train_time=0.265 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:30:46,770 (trainer:732) INFO: 8epoch:train:3581-3938batch: iter_time=0.004, forward_time=0.104, loss_ctc=154.220, loss_att=90.742, acc=0.215, loss=109.786, backward_time=0.054, grad_norm=142.130, clip=100.000, loss_scale=68.112, optim_step_time=0.033, optim0_lr0=7.463e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:32:21,210 (trainer:732) INFO: 8epoch:train:3939-4296batch: iter_time=0.006, forward_time=0.103, loss_ctc=147.268, loss_att=86.467, acc=0.217, loss=104.707, backward_time=0.054, grad_norm=114.561, clip=100.000, loss_scale=128.000, optim_step_time=0.033, optim0_lr0=7.439e-05, train_time=0.264 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:33:10,538 (trainer:663) WARNING: The grad norm is nan. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:33:53,916 (trainer:732) INFO: 8epoch:train:4297-4654batch: iter_time=0.006, forward_time=0.100, loss_ctc=146.133, loss_att=85.522, acc=0.218, loss=103.706, backward_time=0.054, grad_norm=106.011, clip=100.000, loss_scale=98.062, optim_step_time=0.033, optim0_lr0=7.414e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:35:26,804 (trainer:732) INFO: 8epoch:train:4655-5012batch: iter_time=0.006, forward_time=0.101, loss_ctc=145.409, loss_att=85.623, acc=0.217, loss=103.559, backward_time=0.054, grad_norm=111.297, clip=100.000, loss_scale=64.000, optim_step_time=0.034, optim0_lr0=7.390e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:36:59,941 (trainer:732) INFO: 8epoch:train:5013-5370batch: iter_time=0.005, forward_time=0.102, loss_ctc=146.148, loss_att=85.822, acc=0.217, loss=103.920, backward_time=0.056, grad_norm=141.822, clip=100.000, loss_scale=64.000, optim_step_time=0.033, optim0_lr0=7.366e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:37:23,500 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:38:32,613 (trainer:732) INFO: 8epoch:train:5371-5728batch: iter_time=0.003, forward_time=0.102, loss_ctc=150.157, loss_att=87.849, acc=0.219, loss=106.541, backward_time=0.054, grad_norm=135.049, clip=100.000, loss_scale=64.000, optim_step_time=0.033, optim0_lr0=7.342e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:39:42,826 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:40:06,837 (trainer:732) INFO: 8epoch:train:5729-6086batch: iter_time=0.006, forward_time=0.103, loss_ctc=148.293, loss_att=86.830, acc=0.220, loss=105.269, backward_time=0.054, grad_norm=119.378, clip=100.000, loss_scale=64.000, optim_step_time=0.033, optim0_lr0=7.319e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:41:15,371 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:41:42,546 (trainer:732) INFO: 8epoch:train:6087-6444batch: iter_time=0.011, forward_time=0.103, loss_ctc=140.421, loss_att=82.140, acc=0.222, loss=99.625, backward_time=0.054, grad_norm=142.254, clip=100.000, loss_scale=64.000, optim_step_time=0.034, optim0_lr0=7.296e-05, train_time=0.267 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:42:14,894 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:43:18,396 (trainer:732) INFO: 8epoch:train:6445-6802batch: iter_time=0.010, forward_time=0.104, loss_ctc=147.653, loss_att=86.024, acc=0.222, loss=104.513, backward_time=0.054, grad_norm=126.558, clip=100.000, loss_scale=77.625, optim_step_time=0.034, optim0_lr0=7.273e-05, train_time=0.267 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:44:53,290 (trainer:732) INFO: 8epoch:train:6803-7160batch: iter_time=0.009, forward_time=0.102, loss_ctc=147.864, loss_att=86.202, acc=0.220, loss=104.701, backward_time=0.055, grad_norm=100.531, clip=100.000, loss_scale=64.000, optim_step_time=0.034, optim0_lr0=7.250e-05, train_time=0.265 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:46:01,621 (trainer:338) INFO: 8epoch results: [train] iter_time=0.005, forward_time=0.103, loss_ctc=148.122, loss_att=87.041, acc=0.217, loss=105.365, backward_time=0.055, grad_norm=114.307, clip=100.000, loss_scale=78.359, optim_step_time=0.034, optim0_lr0=7.480e-05, train_time=0.263, time=31 minutes and 24.63 seconds, total_count=57288, gpu_max_cached_mem_GB=28.451, [valid] loss_ctc=144.636, cer_ctc=0.939, loss_att=81.318, acc=0.245, cer=0.689, wer=1.000, loss=100.314, time=14.88 seconds, total_count=424, gpu_max_cached_mem_GB=28.451, [att_plot] time=52.78 seconds, total_count=0, gpu_max_cached_mem_GB=28.451 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:46:05,589 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:46:05,591 (trainer:272) INFO: 9/100epoch started. Estimated time to finish: 2 days, 2 hours and 18 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:47:40,625 (trainer:732) INFO: 9epoch:train:1-358batch: iter_time=0.007, forward_time=0.102, loss_ctc=142.840, loss_att=83.365, acc=0.222, loss=101.208, backward_time=0.058, grad_norm=103.847, clip=100.000, loss_scale=64.000, optim_step_time=0.035, optim0_lr0=7.227e-05, train_time=0.265 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:48:12,763 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:49:13,949 (trainer:732) INFO: 9epoch:train:359-716batch: iter_time=0.003, forward_time=0.102, loss_ctc=146.018, loss_att=85.004, acc=0.222, loss=103.308, backward_time=0.056, grad_norm=133.569, clip=100.000, loss_scale=64.000, optim_step_time=0.034, optim0_lr0=7.205e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:50:46,153 (trainer:732) INFO: 9epoch:train:717-1074batch: iter_time=0.002, forward_time=0.103, loss_ctc=143.841, loss_att=83.782, acc=0.224, loss=101.800, backward_time=0.054, grad_norm=146.785, clip=100.000, loss_scale=64.000, optim_step_time=0.034, optim0_lr0=7.182e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:52:12,749 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:52:20,110 (trainer:732) INFO: 9epoch:train:1075-1432batch: iter_time=0.003, forward_time=0.103, loss_ctc=149.950, loss_att=86.755, acc=0.226, loss=105.713, backward_time=0.057, grad_norm=128.232, clip=100.000, loss_scale=69.184, optim_step_time=0.033, optim0_lr0=7.160e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:53:53,230 (trainer:732) INFO: 9epoch:train:1433-1790batch: iter_time=0.002, forward_time=0.102, loss_ctc=149.278, loss_att=86.682, acc=0.223, loss=105.461, backward_time=0.057, grad_norm=117.037, clip=100.000, loss_scale=128.000, optim_step_time=0.034, optim0_lr0=7.139e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:55:27,273 (trainer:732) INFO: 9epoch:train:1791-2148batch: iter_time=0.005, forward_time=0.102, loss_ctc=145.355, loss_att=84.180, acc=0.226, loss=102.532, backward_time=0.057, grad_norm=105.165, clip=100.000, loss_scale=128.000, optim_step_time=0.034, optim0_lr0=7.117e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:57:01,857 (trainer:732) INFO: 9epoch:train:2149-2506batch: iter_time=0.004, forward_time=0.103, loss_ctc=153.494, loss_att=88.691, acc=0.225, loss=108.132, backward_time=0.058, grad_norm=84.992, clip=100.000, loss_scale=128.000, optim_step_time=0.034, optim0_lr0=7.096e-05, train_time=0.264 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 21:58:34,063 (trainer:732) INFO: 9epoch:train:2507-2864batch: iter_time=0.003, forward_time=0.101, loss_ctc=149.606, loss_att=87.137, acc=0.221, loss=105.878, backward_time=0.056, grad_norm=90.608, clip=100.000, loss_scale=128.000, optim_step_time=0.034, optim0_lr0=7.074e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:00:07,496 (trainer:732) INFO: 9epoch:train:2865-3222batch: iter_time=0.003, forward_time=0.102, loss_ctc=155.784, loss_att=90.156, acc=0.224, loss=109.844, backward_time=0.055, grad_norm=61.536, clip=100.000, loss_scale=128.000, optim_step_time=0.034, optim0_lr0=7.053e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:01:43,028 (trainer:732) INFO: 9epoch:train:3223-3580batch: iter_time=0.007, forward_time=0.103, loss_ctc=149.672, loss_att=87.093, acc=0.224, loss=105.867, backward_time=0.055, grad_norm=48.075, clip=100.000, loss_scale=191.285, optim_step_time=0.034, optim0_lr0=7.033e-05, train_time=0.267 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:03:17,330 (trainer:732) INFO: 9epoch:train:3581-3938batch: iter_time=0.005, forward_time=0.102, loss_ctc=148.068, loss_att=86.060, acc=0.225, loss=104.662, backward_time=0.055, grad_norm=75.061, clip=100.000, loss_scale=256.000, optim_step_time=0.034, optim0_lr0=7.012e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:04:33,489 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:04:50,067 (trainer:732) INFO: 9epoch:train:3939-4296batch: iter_time=0.004, forward_time=0.101, loss_ctc=148.091, loss_att=86.061, acc=0.225, loss=104.670, backward_time=0.055, grad_norm=94.112, clip=100.000, loss_scale=233.412, optim_step_time=0.034, optim0_lr0=6.991e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:06:24,337 (trainer:732) INFO: 9epoch:train:4297-4654batch: iter_time=0.005, forward_time=0.102, loss_ctc=150.991, loss_att=87.468, acc=0.226, loss=106.525, backward_time=0.055, grad_norm=103.349, clip=100.000, loss_scale=128.000, optim_step_time=0.034, optim0_lr0=6.971e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:07:58,206 (trainer:732) INFO: 9epoch:train:4655-5012batch: iter_time=0.005, forward_time=0.101, loss_ctc=147.572, loss_att=85.515, acc=0.225, loss=104.132, backward_time=0.056, grad_norm=88.136, clip=100.000, loss_scale=128.000, optim_step_time=0.034, optim0_lr0=6.951e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:09:31,798 (trainer:732) INFO: 9epoch:train:5013-5370batch: iter_time=0.006, forward_time=0.101, loss_ctc=149.236, loss_att=86.535, acc=0.227, loss=105.345, backward_time=0.055, grad_norm=80.095, clip=100.000, loss_scale=128.000, optim_step_time=0.034, optim0_lr0=6.931e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:11:07,162 (trainer:732) INFO: 9epoch:train:5371-5728batch: iter_time=0.012, forward_time=0.101, loss_ctc=144.014, loss_att=83.696, acc=0.228, loss=101.791, backward_time=0.055, grad_norm=63.313, clip=100.000, loss_scale=128.000, optim_step_time=0.034, optim0_lr0=6.911e-05, train_time=0.266 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:12:43,373 (trainer:732) INFO: 9epoch:train:5729-6086batch: iter_time=0.011, forward_time=0.102, loss_ctc=143.924, loss_att=83.687, acc=0.228, loss=101.758, backward_time=0.055, grad_norm=64.540, clip=100.000, loss_scale=128.000, optim_step_time=0.034, optim0_lr0=6.892e-05, train_time=0.268 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:14:13,142 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:14:20,234 (trainer:732) INFO: 9epoch:train:6087-6444batch: iter_time=0.011, forward_time=0.103, loss_ctc=145.039, loss_att=84.040, acc=0.228, loss=102.340, backward_time=0.056, grad_norm=74.781, clip=100.000, loss_scale=193.613, optim_step_time=0.035, optim0_lr0=6.872e-05, train_time=0.270 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:14:59,498 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:15:56,267 (trainer:732) INFO: 9epoch:train:6445-6802batch: iter_time=0.010, forward_time=0.102, loss_ctc=149.287, loss_att=86.763, acc=0.226, loss=105.520, backward_time=0.056, grad_norm=75.489, clip=100.000, loss_scale=128.000, optim_step_time=0.034, optim0_lr0=6.853e-05, train_time=0.268 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:17:31,262 (trainer:732) INFO: 9epoch:train:6803-7160batch: iter_time=0.011, forward_time=0.101, loss_ctc=144.913, loss_att=84.182, acc=0.228, loss=102.401, backward_time=0.054, grad_norm=72.213, clip=100.000, loss_scale=128.000, optim_step_time=0.034, optim0_lr0=6.834e-05, train_time=0.265 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:18:39,448 (trainer:338) INFO: 9epoch results: [train] iter_time=0.006, forward_time=0.102, loss_ctc=147.803, loss_att=85.818, acc=0.225, loss=104.414, backward_time=0.056, grad_norm=90.544, clip=100.000, loss_scale=133.552, optim_step_time=0.034, optim0_lr0=7.025e-05, train_time=0.263, time=31 minutes and 26.4 seconds, total_count=64449, gpu_max_cached_mem_GB=28.451, [valid] loss_ctc=143.794, cer_ctc=0.945, loss_att=79.893, acc=0.257, cer=0.683, wer=1.000, loss=99.063, time=14.74 seconds, total_count=477, gpu_max_cached_mem_GB=28.451, [att_plot] time=52.71 seconds, total_count=0, gpu_max_cached_mem_GB=28.451 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:18:43,288 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:18:43,290 (trainer:272) INFO: 10/100epoch started. Estimated time to finish: 2 days, 1 hour and 43 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:20:17,667 (trainer:732) INFO: 10epoch:train:1-358batch: iter_time=0.006, forward_time=0.102, loss_ctc=148.846, loss_att=85.944, acc=0.230, loss=104.815, backward_time=0.054, grad_norm=72.676, clip=100.000, loss_scale=128.000, optim_step_time=0.033, optim0_lr0=6.815e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:21:50,060 (trainer:732) INFO: 10epoch:train:359-716batch: iter_time=0.002, forward_time=0.101, loss_ctc=146.172, loss_att=84.953, acc=0.228, loss=103.319, backward_time=0.054, grad_norm=76.768, clip=100.000, loss_scale=128.000, optim_step_time=0.033, optim0_lr0=6.796e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:22:01,440 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:23:22,693 (trainer:732) INFO: 10epoch:train:717-1074batch: iter_time=0.003, forward_time=0.101, loss_ctc=150.029, loss_att=86.628, acc=0.231, loss=105.648, backward_time=0.055, grad_norm=92.204, clip=100.000, loss_scale=71.709, optim_step_time=0.034, optim0_lr0=6.777e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:24:55,212 (trainer:732) INFO: 10epoch:train:1075-1432batch: iter_time=0.002, forward_time=0.102, loss_ctc=143.966, loss_att=82.993, acc=0.234, loss=101.285, backward_time=0.055, grad_norm=70.373, clip=100.000, loss_scale=64.000, optim_step_time=0.034, optim0_lr0=6.759e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:26:28,193 (trainer:732) INFO: 10epoch:train:1433-1790batch: iter_time=0.004, forward_time=0.101, loss_ctc=145.485, loss_att=84.027, acc=0.232, loss=102.464, backward_time=0.054, grad_norm=57.391, clip=100.000, loss_scale=64.000, optim_step_time=0.033, optim0_lr0=6.741e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:28:01,409 (trainer:732) INFO: 10epoch:train:1791-2148batch: iter_time=0.004, forward_time=0.102, loss_ctc=149.686, loss_att=86.548, acc=0.231, loss=105.489, backward_time=0.054, grad_norm=63.726, clip=100.000, loss_scale=64.000, optim_step_time=0.033, optim0_lr0=6.722e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:29:34,151 (trainer:732) INFO: 10epoch:train:2149-2506batch: iter_time=0.004, forward_time=0.101, loss_ctc=148.671, loss_att=86.249, acc=0.230, loss=104.976, backward_time=0.053, grad_norm=72.890, clip=100.000, loss_scale=64.000, optim_step_time=0.034, optim0_lr0=6.704e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:31:08,055 (trainer:732) INFO: 10epoch:train:2507-2864batch: iter_time=0.006, forward_time=0.100, loss_ctc=144.965, loss_att=83.929, acc=0.233, loss=102.240, backward_time=0.054, grad_norm=59.726, clip=100.000, loss_scale=82.592, optim_step_time=0.034, optim0_lr0=6.686e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:31:49,201 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:32:40,156 (trainer:732) INFO: 10epoch:train:2865-3222batch: iter_time=0.004, forward_time=0.099, loss_ctc=151.709, loss_att=87.899, acc=0.231, loss=107.042, backward_time=0.054, grad_norm=91.833, clip=100.000, loss_scale=128.000, optim_step_time=0.033, optim0_lr0=6.669e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:34:12,206 (trainer:732) INFO: 10epoch:train:3223-3580batch: iter_time=0.005, forward_time=0.099, loss_ctc=147.591, loss_att=85.434, acc=0.233, loss=104.081, backward_time=0.056, grad_norm=71.724, clip=100.000, loss_scale=128.000, optim_step_time=0.034, optim0_lr0=6.651e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:35:46,064 (trainer:732) INFO: 10epoch:train:3581-3938batch: iter_time=0.009, forward_time=0.099, loss_ctc=146.571, loss_att=85.194, acc=0.231, loss=103.607, backward_time=0.055, grad_norm=77.440, clip=100.000, loss_scale=128.000, optim_step_time=0.033, optim0_lr0=6.634e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:37:19,802 (trainer:732) INFO: 10epoch:train:3939-4296batch: iter_time=0.006, forward_time=0.100, loss_ctc=149.412, loss_att=86.319, acc=0.234, loss=105.247, backward_time=0.055, grad_norm=59.155, clip=100.000, loss_scale=128.000, optim_step_time=0.034, optim0_lr0=6.616e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:38:53,260 (trainer:732) INFO: 10epoch:train:4297-4654batch: iter_time=0.006, forward_time=0.100, loss_ctc=147.760, loss_att=85.882, acc=0.232, loss=104.445, backward_time=0.054, grad_norm=55.833, clip=100.000, loss_scale=128.000, optim_step_time=0.034, optim0_lr0=6.599e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:40:31,179 (trainer:732) INFO: 10epoch:train:4655-5012batch: iter_time=0.014, forward_time=0.102, loss_ctc=145.407, loss_att=84.456, acc=0.234, loss=102.741, backward_time=0.056, grad_norm=51.632, clip=100.000, loss_scale=218.101, optim_step_time=0.034, optim0_lr0=6.582e-05, train_time=0.273 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:40:44,867 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:42:06,375 (trainer:732) INFO: 10epoch:train:5013-5370batch: iter_time=0.009, forward_time=0.101, loss_ctc=143.175, loss_att=83.247, acc=0.234, loss=101.226, backward_time=0.057, grad_norm=63.186, clip=100.000, loss_scale=256.000, optim_step_time=0.034, optim0_lr0=6.565e-05, train_time=0.266 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:43:38,288 (trainer:732) INFO: 10epoch:train:5371-5728batch: iter_time=0.007, forward_time=0.098, loss_ctc=140.009, loss_att=81.587, acc=0.235, loss=99.113, backward_time=0.055, grad_norm=72.287, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=6.548e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:45:13,562 (trainer:732) INFO: 10epoch:train:5729-6086batch: iter_time=0.012, forward_time=0.099, loss_ctc=143.750, loss_att=83.881, acc=0.233, loss=101.842, backward_time=0.055, grad_norm=64.925, clip=100.000, loss_scale=256.000, optim_step_time=0.034, optim0_lr0=6.531e-05, train_time=0.266 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:46:49,506 (trainer:732) INFO: 10epoch:train:6087-6444batch: iter_time=0.013, forward_time=0.100, loss_ctc=143.386, loss_att=83.268, acc=0.236, loss=101.303, backward_time=0.055, grad_norm=73.517, clip=100.000, loss_scale=256.000, optim_step_time=0.034, optim0_lr0=6.515e-05, train_time=0.268 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:47:55,945 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:48:23,322 (trainer:732) INFO: 10epoch:train:6445-6802batch: iter_time=0.007, forward_time=0.100, loss_ctc=143.805, loss_att=83.276, acc=0.238, loss=101.435, backward_time=0.055, grad_norm=67.780, clip=100.000, loss_scale=286.034, optim_step_time=0.034, optim0_lr0=6.498e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:48:48,310 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:49:08,748 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:50:00,227 (trainer:732) INFO: 10epoch:train:6803-7160batch: iter_time=0.018, forward_time=0.099, loss_ctc=137.576, loss_att=80.114, acc=0.236, loss=97.353, backward_time=0.055, grad_norm=71.815, clip=100.000, loss_scale=257.798, optim_step_time=0.034, optim0_lr0=6.482e-05, train_time=0.270 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:51:09,171 (trainer:338) INFO: 10epoch results: [train] iter_time=0.007, forward_time=0.100, loss_ctc=145.858, loss_att=84.569, acc=0.233, loss=102.956, backward_time=0.055, grad_norm=69.336, clip=100.000, loss_scale=154.591, optim_step_time=0.034, optim0_lr0=6.645e-05, train_time=0.262, time=31 minutes and 17.65 seconds, total_count=71610, gpu_max_cached_mem_GB=28.451, [valid] loss_ctc=140.411, cer_ctc=0.940, loss_att=78.560, acc=0.266, cer=0.676, wer=1.000, loss=97.115, time=14.65 seconds, total_count=530, gpu_max_cached_mem_GB=28.451, [att_plot] time=53.58 seconds, total_count=0, gpu_max_cached_mem_GB=28.451 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:51:13,083 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:51:13,086 (trainer:272) INFO: 11/100epoch started. Estimated time to finish: 2 days, 1 hour and 7 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:52:46,945 (trainer:732) INFO: 11epoch:train:1-358batch: iter_time=0.003, forward_time=0.102, loss_ctc=148.548, loss_att=86.012, acc=0.236, loss=104.773, backward_time=0.054, grad_norm=71.031, clip=100.000, loss_scale=128.000, optim_step_time=0.033, optim0_lr0=6.466e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:52:50,642 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:54:19,295 (trainer:732) INFO: 11epoch:train:359-716batch: iter_time=0.002, forward_time=0.101, loss_ctc=142.255, loss_att=82.767, acc=0.235, loss=100.613, backward_time=0.055, grad_norm=61.746, clip=100.000, loss_scale=128.000, optim_step_time=0.034, optim0_lr0=6.450e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:55:36,241 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:55:52,550 (trainer:732) INFO: 11epoch:train:717-1074batch: iter_time=6.797e-04, forward_time=0.102, loss_ctc=148.565, loss_att=85.857, acc=0.238, loss=104.670, backward_time=0.056, grad_norm=55.329, clip=100.000, loss_scale=128.000, optim_step_time=0.034, optim0_lr0=6.434e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:57:25,135 (trainer:732) INFO: 11epoch:train:1075-1432batch: iter_time=0.003, forward_time=0.100, loss_ctc=145.067, loss_att=83.752, acc=0.241, loss=102.147, backward_time=0.055, grad_norm=55.031, clip=100.000, loss_scale=128.000, optim_step_time=0.034, optim0_lr0=6.418e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:58:48,503 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 22:58:55,652 (trainer:732) INFO: 11epoch:train:1433-1790batch: iter_time=0.002, forward_time=0.098, loss_ctc=142.023, loss_att=82.474, acc=0.238, loss=100.339, backward_time=0.054, grad_norm=61.021, clip=100.000, loss_scale=122.801, optim_step_time=0.033, optim0_lr0=6.402e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:00:27,518 (trainer:732) INFO: 11epoch:train:1791-2148batch: iter_time=0.005, forward_time=0.099, loss_ctc=141.419, loss_att=81.885, acc=0.241, loss=99.745, backward_time=0.055, grad_norm=54.470, clip=100.000, loss_scale=64.000, optim_step_time=0.033, optim0_lr0=6.387e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:02:01,205 (trainer:732) INFO: 11epoch:train:2149-2506batch: iter_time=0.004, forward_time=0.102, loss_ctc=143.799, loss_att=83.273, acc=0.240, loss=101.431, backward_time=0.055, grad_norm=61.393, clip=100.000, loss_scale=64.000, optim_step_time=0.034, optim0_lr0=6.371e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:03:10,875 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:03:32,384 (trainer:732) INFO: 11epoch:train:2507-2864batch: iter_time=0.004, forward_time=0.099, loss_ctc=141.050, loss_att=81.658, acc=0.242, loss=99.476, backward_time=0.054, grad_norm=65.857, clip=100.000, loss_scale=64.000, optim_step_time=0.034, optim0_lr0=6.356e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:05:06,123 (trainer:732) INFO: 11epoch:train:2865-3222batch: iter_time=0.005, forward_time=0.102, loss_ctc=141.886, loss_att=82.302, acc=0.240, loss=100.177, backward_time=0.056, grad_norm=67.491, clip=100.000, loss_scale=64.000, optim_step_time=0.034, optim0_lr0=6.341e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:06:38,702 (trainer:732) INFO: 11epoch:train:3223-3580batch: iter_time=0.005, forward_time=0.100, loss_ctc=140.678, loss_att=81.279, acc=0.243, loss=99.099, backward_time=0.055, grad_norm=65.608, clip=100.000, loss_scale=64.000, optim_step_time=0.033, optim0_lr0=6.326e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:08:12,880 (trainer:732) INFO: 11epoch:train:3581-3938batch: iter_time=0.007, forward_time=0.102, loss_ctc=141.504, loss_att=81.732, acc=0.245, loss=99.663, backward_time=0.055, grad_norm=54.771, clip=100.000, loss_scale=95.642, optim_step_time=0.034, optim0_lr0=6.311e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:09:47,146 (trainer:732) INFO: 11epoch:train:3939-4296batch: iter_time=0.004, forward_time=0.105, loss_ctc=140.784, loss_att=81.554, acc=0.244, loss=99.323, backward_time=0.054, grad_norm=65.972, clip=100.000, loss_scale=128.000, optim_step_time=0.033, optim0_lr0=6.296e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:11:21,656 (trainer:732) INFO: 11epoch:train:4297-4654batch: iter_time=0.005, forward_time=0.106, loss_ctc=143.464, loss_att=82.683, acc=0.247, loss=100.917, backward_time=0.054, grad_norm=77.795, clip=100.000, loss_scale=128.000, optim_step_time=0.033, optim0_lr0=6.281e-05, train_time=0.264 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:12:57,180 (trainer:732) INFO: 11epoch:train:4655-5012batch: iter_time=0.005, forward_time=0.107, loss_ctc=142.187, loss_att=82.103, acc=0.246, loss=100.128, backward_time=0.054, grad_norm=63.770, clip=100.000, loss_scale=128.000, optim_step_time=0.033, optim0_lr0=6.266e-05, train_time=0.267 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:14:33,608 (trainer:732) INFO: 11epoch:train:5013-5370batch: iter_time=0.006, forward_time=0.107, loss_ctc=143.137, loss_att=82.397, acc=0.248, loss=100.619, backward_time=0.057, grad_norm=73.304, clip=100.000, loss_scale=128.000, optim_step_time=0.033, optim0_lr0=6.251e-05, train_time=0.269 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:16:06,774 (trainer:732) INFO: 11epoch:train:5371-5728batch: iter_time=0.003, forward_time=0.104, loss_ctc=141.816, loss_att=81.966, acc=0.247, loss=99.921, backward_time=0.055, grad_norm=66.763, clip=100.000, loss_scale=128.000, optim_step_time=0.034, optim0_lr0=6.237e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:17:41,766 (trainer:732) INFO: 11epoch:train:5729-6086batch: iter_time=0.002, forward_time=0.107, loss_ctc=149.413, loss_att=86.012, acc=0.248, loss=105.032, backward_time=0.055, grad_norm=70.027, clip=100.000, loss_scale=244.201, optim_step_time=0.033, optim0_lr0=6.222e-05, train_time=0.265 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:19:17,772 (trainer:732) INFO: 11epoch:train:6087-6444batch: iter_time=0.007, forward_time=0.107, loss_ctc=138.023, loss_att=79.936, acc=0.247, loss=97.362, backward_time=0.055, grad_norm=82.661, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=6.208e-05, train_time=0.268 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:20:54,706 (trainer:732) INFO: 11epoch:train:6445-6802batch: iter_time=0.009, forward_time=0.107, loss_ctc=137.023, loss_att=79.163, acc=0.250, loss=96.521, backward_time=0.056, grad_norm=73.301, clip=100.000, loss_scale=256.000, optim_step_time=0.034, optim0_lr0=6.194e-05, train_time=0.270 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:22:29,794 (trainer:732) INFO: 11epoch:train:6803-7160batch: iter_time=0.006, forward_time=0.105, loss_ctc=143.605, loss_att=83.141, acc=0.246, loss=101.281, backward_time=0.055, grad_norm=95.368, clip=100.000, loss_scale=256.000, optim_step_time=0.034, optim0_lr0=6.180e-05, train_time=0.265 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:23:39,692 (trainer:338) INFO: 11epoch results: [train] iter_time=0.004, forward_time=0.103, loss_ctc=142.778, loss_att=82.578, acc=0.243, loss=100.638, backward_time=0.055, grad_norm=67.138, clip=100.000, loss_scale=135.151, optim_step_time=0.034, optim0_lr0=6.320e-05, train_time=0.262, time=31 minutes and 17.41 seconds, total_count=78771, gpu_max_cached_mem_GB=28.451, [valid] loss_ctc=135.892, cer_ctc=0.925, loss_att=75.529, acc=0.285, cer=0.668, wer=1.000, loss=93.638, time=15.04 seconds, total_count=583, gpu_max_cached_mem_GB=28.451, [att_plot] time=54.15 seconds, total_count=0, gpu_max_cached_mem_GB=28.451 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:23:43,247 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:23:43,262 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/1epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:23:43,263 (trainer:272) INFO: 12/100epoch started. Estimated time to finish: 2 days, 32 minutes and 4.63 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:25:15,785 (trainer:732) INFO: 12epoch:train:1-358batch: iter_time=0.006, forward_time=0.100, loss_ctc=134.569, loss_att=77.468, acc=0.253, loss=94.598, backward_time=0.054, grad_norm=82.378, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=6.166e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:26:47,658 (trainer:732) INFO: 12epoch:train:359-716batch: iter_time=4.510e-04, forward_time=0.101, loss_ctc=145.127, loss_att=83.288, acc=0.250, loss=101.840, backward_time=0.054, grad_norm=95.854, clip=100.000, loss_scale=338.950, optim_step_time=0.033, optim0_lr0=6.152e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:28:19,832 (trainer:732) INFO: 12epoch:train:717-1074batch: iter_time=0.003, forward_time=0.100, loss_ctc=143.364, loss_att=82.146, acc=0.254, loss=100.511, backward_time=0.054, grad_norm=94.748, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=6.138e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:28:43,733 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:29:52,216 (trainer:732) INFO: 12epoch:train:1075-1432batch: iter_time=0.006, forward_time=0.100, loss_ctc=136.027, loss_att=78.305, acc=0.252, loss=95.622, backward_time=0.054, grad_norm=93.854, clip=100.000, loss_scale=323.406, optim_step_time=0.034, optim0_lr0=6.124e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:31:25,202 (trainer:732) INFO: 12epoch:train:1433-1790batch: iter_time=0.006, forward_time=0.100, loss_ctc=133.897, loss_att=77.125, acc=0.253, loss=94.156, backward_time=0.054, grad_norm=91.543, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=6.111e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:33:00,791 (trainer:732) INFO: 12epoch:train:1791-2148batch: iter_time=0.005, forward_time=0.104, loss_ctc=137.338, loss_att=79.055, acc=0.254, loss=96.540, backward_time=0.057, grad_norm=83.030, clip=100.000, loss_scale=256.000, optim_step_time=0.034, optim0_lr0=6.097e-05, train_time=0.267 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:34:34,325 (trainer:732) INFO: 12epoch:train:2149-2506batch: iter_time=0.003, forward_time=0.102, loss_ctc=140.138, loss_att=80.574, acc=0.254, loss=98.443, backward_time=0.057, grad_norm=88.188, clip=100.000, loss_scale=256.000, optim_step_time=0.035, optim0_lr0=6.084e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:36:08,880 (trainer:732) INFO: 12epoch:train:2507-2864batch: iter_time=0.004, forward_time=0.103, loss_ctc=144.233, loss_att=83.104, acc=0.252, loss=101.442, backward_time=0.057, grad_norm=87.095, clip=100.000, loss_scale=256.000, optim_step_time=0.034, optim0_lr0=6.070e-05, train_time=0.264 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:36:28,037 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:37:41,837 (trainer:732) INFO: 12epoch:train:2865-3222batch: iter_time=0.006, forward_time=0.099, loss_ctc=140.435, loss_att=80.868, acc=0.254, loss=98.738, backward_time=0.056, grad_norm=95.358, clip=100.000, loss_scale=293.899, optim_step_time=0.034, optim0_lr0=6.057e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:39:03,159 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:39:14,534 (trainer:732) INFO: 12epoch:train:3223-3580batch: iter_time=0.004, forward_time=0.100, loss_ctc=140.747, loss_att=80.908, acc=0.255, loss=98.860, backward_time=0.058, grad_norm=101.352, clip=100.000, loss_scale=480.448, optim_step_time=0.034, optim0_lr0=6.044e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:40:49,681 (trainer:732) INFO: 12epoch:train:3581-3938batch: iter_time=0.011, forward_time=0.100, loss_ctc=136.613, loss_att=78.755, acc=0.256, loss=96.112, backward_time=0.057, grad_norm=105.898, clip=100.000, loss_scale=256.000, optim_step_time=0.034, optim0_lr0=6.031e-05, train_time=0.266 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:42:24,472 (trainer:732) INFO: 12epoch:train:3939-4296batch: iter_time=0.008, forward_time=0.101, loss_ctc=137.195, loss_att=79.129, acc=0.256, loss=96.549, backward_time=0.056, grad_norm=130.858, clip=100.000, loss_scale=256.000, optim_step_time=0.034, optim0_lr0=6.017e-05, train_time=0.265 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:42:24,743 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:43:58,931 (trainer:732) INFO: 12epoch:train:4297-4654batch: iter_time=0.005, forward_time=0.102, loss_ctc=140.544, loss_att=80.893, acc=0.257, loss=98.789, backward_time=0.057, grad_norm=115.516, clip=100.000, loss_scale=128.000, optim_step_time=0.034, optim0_lr0=6.005e-05, train_time=0.264 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:45:34,287 (trainer:732) INFO: 12epoch:train:4655-5012batch: iter_time=0.006, forward_time=0.102, loss_ctc=141.627, loss_att=81.644, acc=0.256, loss=99.639, backward_time=0.056, grad_norm=127.320, clip=100.000, loss_scale=128.000, optim_step_time=0.034, optim0_lr0=5.992e-05, train_time=0.266 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:47:07,965 (trainer:732) INFO: 12epoch:train:5013-5370batch: iter_time=0.005, forward_time=0.100, loss_ctc=141.584, loss_att=81.378, acc=0.257, loss=99.440, backward_time=0.058, grad_norm=110.415, clip=100.000, loss_scale=128.000, optim_step_time=0.034, optim0_lr0=5.979e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:47:45,182 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:48:43,999 (trainer:732) INFO: 12epoch:train:5371-5728batch: iter_time=0.006, forward_time=0.103, loss_ctc=140.675, loss_att=80.960, acc=0.257, loss=98.875, backward_time=0.058, grad_norm=106.846, clip=100.000, loss_scale=128.000, optim_step_time=0.034, optim0_lr0=5.966e-05, train_time=0.268 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:50:18,649 (trainer:732) INFO: 12epoch:train:5729-6086batch: iter_time=0.010, forward_time=0.100, loss_ctc=133.254, loss_att=76.679, acc=0.259, loss=93.651, backward_time=0.056, grad_norm=110.522, clip=100.000, loss_scale=128.000, optim_step_time=0.035, optim0_lr0=5.954e-05, train_time=0.264 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:51:50,771 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:51:54,887 (trainer:732) INFO: 12epoch:train:6087-6444batch: iter_time=0.010, forward_time=0.102, loss_ctc=143.218, loss_att=82.495, acc=0.257, loss=100.712, backward_time=0.055, grad_norm=113.166, clip=100.000, loss_scale=180.559, optim_step_time=0.034, optim0_lr0=5.941e-05, train_time=0.269 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:53:31,033 (trainer:732) INFO: 12epoch:train:6445-6802batch: iter_time=0.010, forward_time=0.102, loss_ctc=142.863, loss_att=82.115, acc=0.259, loss=100.339, backward_time=0.057, grad_norm=120.782, clip=100.000, loss_scale=256.000, optim_step_time=0.034, optim0_lr0=5.929e-05, train_time=0.268 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:55:08,270 (trainer:732) INFO: 12epoch:train:6803-7160batch: iter_time=0.010, forward_time=0.103, loss_ctc=143.278, loss_att=82.395, acc=0.259, loss=100.660, backward_time=0.057, grad_norm=113.659, clip=100.000, loss_scale=256.000, optim_step_time=0.034, optim0_lr0=5.916e-05, train_time=0.271 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:56:18,809 (trainer:338) INFO: 12epoch results: [train] iter_time=0.006, forward_time=0.101, loss_ctc=139.755, loss_att=80.418, acc=0.255, loss=98.219, backward_time=0.056, grad_norm=103.418, clip=100.000, loss_scale=253.640, optim_step_time=0.034, optim0_lr0=6.038e-05, train_time=0.263, time=31 minutes and 25.67 seconds, total_count=85932, gpu_max_cached_mem_GB=28.451, [valid] loss_ctc=132.734, cer_ctc=0.917, loss_att=73.631, acc=0.295, cer=0.650, wer=1.000, loss=91.362, time=15.12 seconds, total_count=636, gpu_max_cached_mem_GB=28.451, [att_plot] time=54.74 seconds, total_count=0, gpu_max_cached_mem_GB=28.451 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:56:22,364 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:56:22,380 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/2epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:56:22,381 (trainer:272) INFO: 13/100epoch started. Estimated time to finish: 1 day, 23 hours and 58 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:57:56,114 (trainer:732) INFO: 13epoch:train:1-358batch: iter_time=0.005, forward_time=0.101, loss_ctc=136.577, loss_att=78.496, acc=0.259, loss=95.920, backward_time=0.057, grad_norm=114.590, clip=100.000, loss_scale=256.000, optim_step_time=0.034, optim0_lr0=5.904e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-12 23:59:30,321 (trainer:732) INFO: 13epoch:train:359-716batch: iter_time=0.002, forward_time=0.102, loss_ctc=142.721, loss_att=81.908, acc=0.260, loss=100.152, backward_time=0.057, grad_norm=123.939, clip=100.000, loss_scale=256.000, optim_step_time=0.034, optim0_lr0=5.892e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:00:29,950 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:01:04,457 (trainer:732) INFO: 13epoch:train:717-1074batch: iter_time=0.003, forward_time=0.102, loss_ctc=140.234, loss_att=80.812, acc=0.258, loss=98.639, backward_time=0.058, grad_norm=115.018, clip=100.000, loss_scale=256.000, optim_step_time=0.034, optim0_lr0=5.879e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:02:35,608 (trainer:732) INFO: 13epoch:train:1075-1432batch: iter_time=0.002, forward_time=0.099, loss_ctc=136.828, loss_att=78.470, acc=0.265, loss=95.977, backward_time=0.053, grad_norm=124.201, clip=100.000, loss_scale=467.665, optim_step_time=0.033, optim0_lr0=5.867e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:03:40,130 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:04:08,536 (trainer:732) INFO: 13epoch:train:1433-1790batch: iter_time=0.003, forward_time=0.101, loss_ctc=139.203, loss_att=80.185, acc=0.262, loss=97.891, backward_time=0.053, grad_norm=119.739, clip=100.000, loss_scale=431.686, optim_step_time=0.033, optim0_lr0=5.855e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:05:41,096 (trainer:732) INFO: 13epoch:train:1791-2148batch: iter_time=0.002, forward_time=0.101, loss_ctc=137.300, loss_att=79.001, acc=0.264, loss=96.491, backward_time=0.054, grad_norm=118.085, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=5.843e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:07:12,400 (trainer:732) INFO: 13epoch:train:2149-2506batch: iter_time=0.002, forward_time=0.100, loss_ctc=136.995, loss_att=78.737, acc=0.265, loss=96.214, backward_time=0.053, grad_norm=122.837, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=5.831e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:08:43,345 (trainer:732) INFO: 13epoch:train:2507-2864batch: iter_time=0.002, forward_time=0.099, loss_ctc=140.635, loss_att=80.643, acc=0.264, loss=98.641, backward_time=0.054, grad_norm=135.640, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=5.820e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:10:13,968 (trainer:732) INFO: 13epoch:train:2865-3222batch: iter_time=0.003, forward_time=0.098, loss_ctc=140.196, loss_att=80.295, acc=0.268, loss=98.265, backward_time=0.053, grad_norm=117.699, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=5.808e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:11:47,160 (trainer:732) INFO: 13epoch:train:3223-3580batch: iter_time=0.009, forward_time=0.099, loss_ctc=132.714, loss_att=76.487, acc=0.266, loss=93.355, backward_time=0.053, grad_norm=127.801, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=5.796e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:13:18,908 (trainer:732) INFO: 13epoch:train:3581-3938batch: iter_time=0.005, forward_time=0.099, loss_ctc=135.754, loss_att=78.273, acc=0.265, loss=95.517, backward_time=0.053, grad_norm=127.074, clip=100.000, loss_scale=441.922, optim_step_time=0.033, optim0_lr0=5.785e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:14:49,531 (trainer:732) INFO: 13epoch:train:3939-4296batch: iter_time=0.005, forward_time=0.097, loss_ctc=134.107, loss_att=77.576, acc=0.264, loss=94.535, backward_time=0.053, grad_norm=114.164, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=5.773e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:16:21,864 (trainer:732) INFO: 13epoch:train:4297-4654batch: iter_time=0.007, forward_time=0.099, loss_ctc=131.950, loss_att=76.312, acc=0.267, loss=93.004, backward_time=0.053, grad_norm=120.495, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=5.762e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:16:25,396 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:17:12,940 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:17:54,233 (trainer:732) INFO: 13epoch:train:4655-5012batch: iter_time=0.006, forward_time=0.099, loss_ctc=131.059, loss_att=75.362, acc=0.271, loss=92.071, backward_time=0.055, grad_norm=131.293, clip=100.000, loss_scale=396.549, optim_step_time=0.033, optim0_lr0=5.750e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:19:14,592 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:19:27,013 (trainer:732) INFO: 13epoch:train:5013-5370batch: iter_time=0.008, forward_time=0.099, loss_ctc=133.977, loss_att=77.139, acc=0.270, loss=94.191, backward_time=0.053, grad_norm=118.375, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=5.739e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:20:58,023 (trainer:732) INFO: 13epoch:train:5371-5728batch: iter_time=0.004, forward_time=0.098, loss_ctc=135.861, loss_att=78.359, acc=0.268, loss=95.610, backward_time=0.054, grad_norm=121.226, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=5.728e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:22:29,716 (trainer:732) INFO: 13epoch:train:5729-6086batch: iter_time=0.006, forward_time=0.098, loss_ctc=138.043, loss_att=79.727, acc=0.267, loss=97.222, backward_time=0.053, grad_norm=119.398, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=5.717e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:24:03,448 (trainer:732) INFO: 13epoch:train:6087-6444batch: iter_time=0.009, forward_time=0.099, loss_ctc=132.753, loss_att=76.383, acc=0.272, loss=93.294, backward_time=0.053, grad_norm=117.921, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=5.706e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:25:35,599 (trainer:732) INFO: 13epoch:train:6445-6802batch: iter_time=0.009, forward_time=0.097, loss_ctc=135.650, loss_att=78.307, acc=0.269, loss=95.510, backward_time=0.053, grad_norm=127.448, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=5.695e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:27:07,761 (trainer:732) INFO: 13epoch:train:6803-7160batch: iter_time=0.006, forward_time=0.099, loss_ctc=140.774, loss_att=81.039, acc=0.270, loss=98.959, backward_time=0.054, grad_norm=126.479, clip=100.000, loss_scale=476.961, optim_step_time=0.033, optim0_lr0=5.684e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:28:14,632 (trainer:338) INFO: 13epoch results: [train] iter_time=0.005, forward_time=0.099, loss_ctc=136.622, loss_att=78.650, acc=0.266, loss=96.042, backward_time=0.054, grad_norm=122.170, clip=100.000, loss_scale=328.341, optim_step_time=0.033, optim0_lr0=5.792e-05, train_time=0.257, time=30 minutes and 46.11 seconds, total_count=93093, gpu_max_cached_mem_GB=28.451, [valid] loss_ctc=128.100, cer_ctc=0.912, loss_att=71.650, acc=0.309, cer=0.644, wer=1.000, loss=88.585, time=14.35 seconds, total_count=689, gpu_max_cached_mem_GB=28.451, [att_plot] time=51.79 seconds, total_count=0, gpu_max_cached_mem_GB=28.451 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:28:18,073 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:28:18,089 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/3epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:28:18,089 (trainer:272) INFO: 14/100epoch started. Estimated time to finish: 1 day, 23 hours and 19 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:29:44,252 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:29:48,851 (trainer:732) INFO: 14epoch:train:1-358batch: iter_time=0.002, forward_time=0.099, loss_ctc=132.850, loss_att=76.119, acc=0.274, loss=93.138, backward_time=0.053, grad_norm=127.859, clip=100.000, loss_scale=498.375, optim_step_time=0.033, optim0_lr0=5.673e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:31:20,924 (trainer:732) INFO: 14epoch:train:359-716batch: iter_time=0.001, forward_time=0.101, loss_ctc=137.948, loss_att=79.755, acc=0.270, loss=97.213, backward_time=0.053, grad_norm=126.862, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=5.662e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:31:25,878 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:32:51,935 (trainer:732) INFO: 14epoch:train:717-1074batch: iter_time=9.487e-04, forward_time=0.100, loss_ctc=137.373, loss_att=79.204, acc=0.273, loss=96.655, backward_time=0.053, grad_norm=129.096, clip=100.000, loss_scale=134.812, optim_step_time=0.033, optim0_lr0=5.651e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:34:23,810 (trainer:732) INFO: 14epoch:train:1075-1432batch: iter_time=0.001, forward_time=0.101, loss_ctc=133.726, loss_att=76.932, acc=0.273, loss=93.970, backward_time=0.054, grad_norm=132.532, clip=100.000, loss_scale=128.000, optim_step_time=0.033, optim0_lr0=5.640e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:35:54,772 (trainer:732) INFO: 14epoch:train:1433-1790batch: iter_time=0.001, forward_time=0.099, loss_ctc=139.235, loss_att=79.941, acc=0.275, loss=97.729, backward_time=0.053, grad_norm=143.467, clip=100.000, loss_scale=128.000, optim_step_time=0.033, optim0_lr0=5.630e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:37:26,685 (trainer:732) INFO: 14epoch:train:1791-2148batch: iter_time=0.002, forward_time=0.102, loss_ctc=132.120, loss_att=76.197, acc=0.275, loss=92.974, backward_time=0.053, grad_norm=127.740, clip=100.000, loss_scale=128.000, optim_step_time=0.033, optim0_lr0=5.619e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:38:59,562 (trainer:732) INFO: 14epoch:train:2149-2506batch: iter_time=0.002, forward_time=0.105, loss_ctc=134.211, loss_att=77.276, acc=0.276, loss=94.356, backward_time=0.053, grad_norm=153.944, clip=100.000, loss_scale=128.000, optim_step_time=0.033, optim0_lr0=5.608e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:40:32,583 (trainer:732) INFO: 14epoch:train:2507-2864batch: iter_time=0.002, forward_time=0.105, loss_ctc=134.763, loss_att=77.544, acc=0.277, loss=94.710, backward_time=0.053, grad_norm=121.856, clip=100.000, loss_scale=173.765, optim_step_time=0.033, optim0_lr0=5.598e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:42:04,241 (trainer:732) INFO: 14epoch:train:2865-3222batch: iter_time=0.003, forward_time=0.102, loss_ctc=129.520, loss_att=74.724, acc=0.277, loss=91.162, backward_time=0.055, grad_norm=138.857, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=5.587e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:42:45,759 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:43:38,752 (trainer:732) INFO: 14epoch:train:3223-3580batch: iter_time=0.002, forward_time=0.107, loss_ctc=138.766, loss_att=80.050, acc=0.276, loss=97.665, backward_time=0.053, grad_norm=126.361, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=5.577e-05, train_time=0.264 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:45:12,258 (trainer:732) INFO: 14epoch:train:3581-3938batch: iter_time=0.003, forward_time=0.105, loss_ctc=131.924, loss_att=76.059, acc=0.278, loss=92.818, backward_time=0.053, grad_norm=130.110, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=5.567e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:46:45,442 (trainer:732) INFO: 14epoch:train:3939-4296batch: iter_time=0.002, forward_time=0.105, loss_ctc=133.721, loss_att=77.057, acc=0.279, loss=94.056, backward_time=0.053, grad_norm=154.382, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=5.556e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:48:19,421 (trainer:732) INFO: 14epoch:train:4297-4654batch: iter_time=0.006, forward_time=0.105, loss_ctc=124.752, loss_att=72.129, acc=0.280, loss=87.916, backward_time=0.054, grad_norm=134.682, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=5.546e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:49:53,551 (trainer:732) INFO: 14epoch:train:4655-5012batch: iter_time=0.004, forward_time=0.106, loss_ctc=126.571, loss_att=73.536, acc=0.279, loss=89.447, backward_time=0.054, grad_norm=133.676, clip=100.000, loss_scale=453.363, optim_step_time=0.033, optim0_lr0=5.536e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:50:10,693 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:50:47,344 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:51:27,080 (trainer:732) INFO: 14epoch:train:5013-5370batch: iter_time=0.002, forward_time=0.106, loss_ctc=133.608, loss_att=77.291, acc=0.279, loss=94.186, backward_time=0.053, grad_norm=138.629, clip=100.000, loss_scale=402.286, optim_step_time=0.033, optim0_lr0=5.526e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:53:00,527 (trainer:732) INFO: 14epoch:train:5371-5728batch: iter_time=0.004, forward_time=0.104, loss_ctc=130.434, loss_att=75.331, acc=0.283, loss=91.862, backward_time=0.053, grad_norm=143.661, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=5.516e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:54:34,972 (trainer:732) INFO: 14epoch:train:5729-6086batch: iter_time=0.005, forward_time=0.106, loss_ctc=130.929, loss_att=75.775, acc=0.281, loss=92.321, backward_time=0.053, grad_norm=156.071, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=5.506e-05, train_time=0.264 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:55:11,920 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:56:08,911 (trainer:732) INFO: 14epoch:train:6087-6444batch: iter_time=0.003, forward_time=0.105, loss_ctc=134.454, loss_att=77.821, acc=0.282, loss=94.811, backward_time=0.053, grad_norm=153.711, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=5.496e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:57:42,221 (trainer:732) INFO: 14epoch:train:6445-6802batch: iter_time=0.004, forward_time=0.105, loss_ctc=130.940, loss_att=75.257, acc=0.287, loss=91.962, backward_time=0.053, grad_norm=153.334, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=5.486e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 00:59:14,582 (trainer:732) INFO: 14epoch:train:6803-7160batch: iter_time=0.004, forward_time=0.103, loss_ctc=130.060, loss_att=75.126, acc=0.284, loss=91.606, backward_time=0.053, grad_norm=137.320, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=5.476e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:00:22,698 (trainer:338) INFO: 14epoch results: [train] iter_time=0.003, forward_time=0.104, loss_ctc=132.830, loss_att=76.621, acc=0.278, loss=93.484, backward_time=0.053, grad_norm=138.201, clip=100.000, loss_scale=249.491, optim_step_time=0.033, optim0_lr0=5.573e-05, train_time=0.259, time=30 minutes and 57.24 seconds, total_count=100254, gpu_max_cached_mem_GB=28.451, [valid] loss_ctc=121.652, cer_ctc=0.909, loss_att=68.499, acc=0.330, cer=0.609, wer=1.000, loss=84.445, time=14.86 seconds, total_count=742, gpu_max_cached_mem_GB=28.451, [att_plot] time=52.51 seconds, total_count=0, gpu_max_cached_mem_GB=28.451 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:00:26,415 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:00:26,430 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/4epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:00:26,430 (trainer:272) INFO: 15/100epoch started. Estimated time to finish: 1 day, 22 hours and 43 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:00:50,720 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:01:49,565 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:01:58,581 (trainer:732) INFO: 15epoch:train:1-358batch: iter_time=0.004, forward_time=0.100, loss_ctc=129.510, loss_att=74.389, acc=0.290, loss=90.925, backward_time=0.053, grad_norm=134.917, clip=100.000, loss_scale=281.815, optim_step_time=0.033, optim0_lr0=5.467e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:02:14,415 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:03:06,944 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:03:29,580 (trainer:732) INFO: 15epoch:train:359-716batch: iter_time=0.002, forward_time=0.100, loss_ctc=131.147, loss_att=75.544, acc=0.288, loss=92.225, backward_time=0.053, grad_norm=150.780, clip=100.000, loss_scale=149.871, optim_step_time=0.033, optim0_lr0=5.457e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:05:00,490 (trainer:732) INFO: 15epoch:train:717-1074batch: iter_time=0.002, forward_time=0.100, loss_ctc=130.460, loss_att=75.159, acc=0.288, loss=91.749, backward_time=0.053, grad_norm=164.931, clip=100.000, loss_scale=128.000, optim_step_time=0.033, optim0_lr0=5.447e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:06:31,673 (trainer:732) INFO: 15epoch:train:1075-1432batch: iter_time=6.448e-04, forward_time=0.101, loss_ctc=132.863, loss_att=76.693, acc=0.288, loss=93.544, backward_time=0.053, grad_norm=152.811, clip=100.000, loss_scale=128.000, optim_step_time=0.033, optim0_lr0=5.438e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:08:02,866 (trainer:732) INFO: 15epoch:train:1433-1790batch: iter_time=0.002, forward_time=0.100, loss_ctc=128.689, loss_att=74.274, acc=0.291, loss=90.598, backward_time=0.055, grad_norm=150.308, clip=100.000, loss_scale=128.000, optim_step_time=0.033, optim0_lr0=5.428e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:09:33,890 (trainer:732) INFO: 15epoch:train:1791-2148batch: iter_time=0.002, forward_time=0.099, loss_ctc=128.503, loss_att=74.564, acc=0.289, loss=90.745, backward_time=0.055, grad_norm=143.544, clip=100.000, loss_scale=128.000, optim_step_time=0.033, optim0_lr0=5.419e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:11:06,489 (trainer:732) INFO: 15epoch:train:2149-2506batch: iter_time=0.005, forward_time=0.100, loss_ctc=128.447, loss_att=74.428, acc=0.291, loss=90.633, backward_time=0.053, grad_norm=137.253, clip=100.000, loss_scale=158.749, optim_step_time=0.033, optim0_lr0=5.409e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:11:10,143 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:12:37,791 (trainer:732) INFO: 15epoch:train:2507-2864batch: iter_time=0.003, forward_time=0.099, loss_ctc=128.865, loss_att=74.673, acc=0.292, loss=90.931, backward_time=0.053, grad_norm=146.613, clip=100.000, loss_scale=133.020, optim_step_time=0.033, optim0_lr0=5.400e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:14:09,326 (trainer:732) INFO: 15epoch:train:2865-3222batch: iter_time=0.005, forward_time=0.098, loss_ctc=127.317, loss_att=73.647, acc=0.295, loss=89.748, backward_time=0.054, grad_norm=147.725, clip=100.000, loss_scale=128.000, optim_step_time=0.033, optim0_lr0=5.390e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:15:40,653 (trainer:732) INFO: 15epoch:train:3223-3580batch: iter_time=0.005, forward_time=0.098, loss_ctc=126.354, loss_att=73.588, acc=0.292, loss=89.418, backward_time=0.053, grad_norm=146.117, clip=100.000, loss_scale=128.000, optim_step_time=0.033, optim0_lr0=5.381e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:16:45,070 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:17:12,233 (trainer:732) INFO: 15epoch:train:3581-3938batch: iter_time=0.006, forward_time=0.098, loss_ctc=127.100, loss_att=74.081, acc=0.293, loss=89.987, backward_time=0.053, grad_norm=161.298, clip=100.000, loss_scale=128.000, optim_step_time=0.033, optim0_lr0=5.372e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:18:44,988 (trainer:732) INFO: 15epoch:train:3939-4296batch: iter_time=0.007, forward_time=0.099, loss_ctc=127.700, loss_att=74.372, acc=0.294, loss=90.370, backward_time=0.053, grad_norm=159.043, clip=100.000, loss_scale=128.000, optim_step_time=0.033, optim0_lr0=5.363e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:20:19,991 (trainer:732) INFO: 15epoch:train:4297-4654batch: iter_time=0.011, forward_time=0.100, loss_ctc=125.842, loss_att=73.182, acc=0.297, loss=88.980, backward_time=0.053, grad_norm=165.698, clip=100.000, loss_scale=175.553, optim_step_time=0.033, optim0_lr0=5.353e-05, train_time=0.265 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:21:52,685 (trainer:732) INFO: 15epoch:train:4655-5012batch: iter_time=0.006, forward_time=0.099, loss_ctc=123.799, loss_att=72.338, acc=0.298, loss=87.776, backward_time=0.054, grad_norm=143.165, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=5.344e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:23:20,921 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:23:24,330 (trainer:732) INFO: 15epoch:train:5013-5370batch: iter_time=0.007, forward_time=0.097, loss_ctc=123.959, loss_att=72.328, acc=0.300, loss=87.817, backward_time=0.054, grad_norm=151.770, clip=100.000, loss_scale=251.339, optim_step_time=0.033, optim0_lr0=5.335e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:24:56,218 (trainer:732) INFO: 15epoch:train:5371-5728batch: iter_time=0.004, forward_time=0.100, loss_ctc=122.987, loss_att=72.018, acc=0.299, loss=87.309, backward_time=0.053, grad_norm=152.568, clip=100.000, loss_scale=128.000, optim_step_time=0.033, optim0_lr0=5.326e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:26:28,700 (trainer:732) INFO: 15epoch:train:5729-6086batch: iter_time=0.006, forward_time=0.100, loss_ctc=124.460, loss_att=72.871, acc=0.301, loss=88.348, backward_time=0.055, grad_norm=173.369, clip=100.000, loss_scale=128.000, optim_step_time=0.033, optim0_lr0=5.317e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:28:02,470 (trainer:732) INFO: 15epoch:train:6087-6444batch: iter_time=0.006, forward_time=0.101, loss_ctc=126.098, loss_att=73.825, acc=0.303, loss=89.507, backward_time=0.053, grad_norm=163.727, clip=100.000, loss_scale=128.000, optim_step_time=0.033, optim0_lr0=5.308e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:29:35,567 (trainer:732) INFO: 15epoch:train:6445-6802batch: iter_time=0.006, forward_time=0.100, loss_ctc=126.524, loss_att=74.169, acc=0.303, loss=89.875, backward_time=0.054, grad_norm=152.020, clip=100.000, loss_scale=128.000, optim_step_time=0.033, optim0_lr0=5.299e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:31:09,108 (trainer:732) INFO: 15epoch:train:6803-7160batch: iter_time=0.007, forward_time=0.101, loss_ctc=125.934, loss_att=73.938, acc=0.304, loss=89.537, backward_time=0.053, grad_norm=153.318, clip=100.000, loss_scale=128.000, optim_step_time=0.033, optim0_lr0=5.290e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:32:16,765 (trainer:338) INFO: 15epoch results: [train] iter_time=0.005, forward_time=0.100, loss_ctc=127.308, loss_att=73.993, acc=0.295, loss=89.987, backward_time=0.053, grad_norm=152.580, clip=100.000, loss_scale=153.486, optim_step_time=0.033, optim0_lr0=5.377e-05, train_time=0.257, time=30 minutes and 43.35 seconds, total_count=107415, gpu_max_cached_mem_GB=28.451, [valid] loss_ctc=110.823, cer_ctc=0.850, loss_att=66.120, acc=0.349, cer=0.572, wer=1.000, loss=79.531, time=14.56 seconds, total_count=795, gpu_max_cached_mem_GB=28.451, [att_plot] time=52.41 seconds, total_count=0, gpu_max_cached_mem_GB=28.451 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:32:20,431 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:32:20,446 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/5epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:32:20,446 (trainer:272) INFO: 16/100epoch started. Estimated time to finish: 1 day, 22 hours and 6 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:33:53,040 (trainer:732) INFO: 16epoch:train:1-358batch: iter_time=0.004, forward_time=0.101, loss_ctc=123.464, loss_att=72.191, acc=0.309, loss=87.573, backward_time=0.053, grad_norm=152.709, clip=100.000, loss_scale=185.922, optim_step_time=0.033, optim0_lr0=5.282e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:35:24,573 (trainer:732) INFO: 16epoch:train:359-716batch: iter_time=0.003, forward_time=0.100, loss_ctc=121.574, loss_att=71.218, acc=0.309, loss=86.325, backward_time=0.054, grad_norm=155.447, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=5.273e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:36:56,151 (trainer:732) INFO: 16epoch:train:717-1074batch: iter_time=0.001, forward_time=0.100, loss_ctc=123.623, loss_att=72.082, acc=0.315, loss=87.544, backward_time=0.055, grad_norm=161.417, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=5.264e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:38:27,638 (trainer:732) INFO: 16epoch:train:1075-1432batch: iter_time=7.941e-04, forward_time=0.101, loss_ctc=122.652, loss_att=71.901, acc=0.313, loss=87.126, backward_time=0.055, grad_norm=168.085, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=5.255e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:38:30,863 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:39:59,734 (trainer:732) INFO: 16epoch:train:1433-1790batch: iter_time=0.003, forward_time=0.101, loss_ctc=121.923, loss_att=71.923, acc=0.312, loss=86.923, backward_time=0.053, grad_norm=152.404, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=5.247e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:41:31,135 (trainer:732) INFO: 16epoch:train:1791-2148batch: iter_time=0.002, forward_time=0.100, loss_ctc=120.634, loss_att=71.263, acc=0.316, loss=86.074, backward_time=0.053, grad_norm=161.442, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=5.238e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:43:01,851 (trainer:732) INFO: 16epoch:train:2149-2506batch: iter_time=0.003, forward_time=0.099, loss_ctc=116.977, loss_att=69.003, acc=0.320, loss=83.395, backward_time=0.053, grad_norm=160.223, clip=100.000, loss_scale=477.676, optim_step_time=0.033, optim0_lr0=5.230e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:44:33,675 (trainer:732) INFO: 16epoch:train:2507-2864batch: iter_time=0.005, forward_time=0.100, loss_ctc=116.543, loss_att=68.996, acc=0.319, loss=83.260, backward_time=0.053, grad_norm=152.323, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=5.221e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:46:06,279 (trainer:732) INFO: 16epoch:train:2865-3222batch: iter_time=0.004, forward_time=0.101, loss_ctc=116.596, loss_att=69.085, acc=0.322, loss=83.338, backward_time=0.055, grad_norm=158.536, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=5.213e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:46:29,354 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:46:59,690 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:47:39,727 (trainer:732) INFO: 16epoch:train:3223-3580batch: iter_time=0.007, forward_time=0.101, loss_ctc=115.319, loss_att=68.517, acc=0.324, loss=82.557, backward_time=0.053, grad_norm=169.886, clip=100.000, loss_scale=400.852, optim_step_time=0.033, optim0_lr0=5.204e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:49:11,995 (trainer:732) INFO: 16epoch:train:3581-3938batch: iter_time=0.003, forward_time=0.100, loss_ctc=121.985, loss_att=72.626, acc=0.320, loss=87.433, backward_time=0.053, grad_norm=166.034, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=5.196e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:50:43,959 (trainer:732) INFO: 16epoch:train:3939-4296batch: iter_time=0.003, forward_time=0.100, loss_ctc=119.593, loss_att=71.190, acc=0.325, loss=85.711, backward_time=0.053, grad_norm=166.754, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=5.187e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:52:16,386 (trainer:732) INFO: 16epoch:train:4297-4654batch: iter_time=0.004, forward_time=0.100, loss_ctc=119.276, loss_att=70.955, acc=0.327, loss=85.451, backward_time=0.053, grad_norm=154.663, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=5.179e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:53:49,376 (trainer:732) INFO: 16epoch:train:4655-5012batch: iter_time=0.005, forward_time=0.101, loss_ctc=114.182, loss_att=67.798, acc=0.333, loss=81.713, backward_time=0.054, grad_norm=163.386, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=5.171e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:55:20,729 (trainer:732) INFO: 16epoch:train:5013-5370batch: iter_time=0.003, forward_time=0.100, loss_ctc=114.373, loss_att=68.216, acc=0.333, loss=82.063, backward_time=0.053, grad_norm=160.520, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=5.163e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:56:53,863 (trainer:732) INFO: 16epoch:train:5371-5728batch: iter_time=0.006, forward_time=0.100, loss_ctc=115.512, loss_att=68.685, acc=0.338, loss=82.733, backward_time=0.054, grad_norm=165.233, clip=100.000, loss_scale=472.670, optim_step_time=0.033, optim0_lr0=5.154e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:57:16,004 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:58:25,994 (trainer:732) INFO: 16epoch:train:5729-6086batch: iter_time=0.008, forward_time=0.098, loss_ctc=112.964, loss_att=67.272, acc=0.339, loss=80.979, backward_time=0.053, grad_norm=162.669, clip=100.000, loss_scale=316.952, optim_step_time=0.033, optim0_lr0=5.146e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 01:59:58,401 (trainer:732) INFO: 16epoch:train:6087-6444batch: iter_time=0.007, forward_time=0.099, loss_ctc=111.777, loss_att=66.420, acc=0.342, loss=80.027, backward_time=0.053, grad_norm=154.846, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=5.138e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:00:23,390 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:01:30,508 (trainer:732) INFO: 16epoch:train:6445-6802batch: iter_time=0.005, forward_time=0.099, loss_ctc=116.551, loss_att=69.199, acc=0.343, loss=83.405, backward_time=0.053, grad_norm=167.297, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=5.130e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:03:03,226 (trainer:732) INFO: 16epoch:train:6803-7160batch: iter_time=0.008, forward_time=0.099, loss_ctc=110.504, loss_att=65.996, acc=0.344, loss=79.348, backward_time=0.054, grad_norm=170.711, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=5.122e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:04:11,310 (trainer:338) INFO: 16epoch results: [train] iter_time=0.004, forward_time=0.100, loss_ctc=117.754, loss_att=69.697, acc=0.325, loss=84.114, backward_time=0.053, grad_norm=161.214, clip=100.000, loss_scale=310.282, optim_step_time=0.033, optim0_lr0=5.201e-05, train_time=0.257, time=30 minutes and 43.45 seconds, total_count=114576, gpu_max_cached_mem_GB=28.451, [valid] loss_ctc=95.703, cer_ctc=0.712, loss_att=56.398, acc=0.422, cer=0.496, wer=0.999, loss=68.189, time=14.58 seconds, total_count=848, gpu_max_cached_mem_GB=28.451, [att_plot] time=52.83 seconds, total_count=0, gpu_max_cached_mem_GB=28.451 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:04:15,089 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:04:15,104 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/6epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:04:15,105 (trainer:272) INFO: 17/100epoch started. Estimated time to finish: 1 day, 21 hours and 30 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:05:47,107 (trainer:732) INFO: 17epoch:train:1-358batch: iter_time=0.004, forward_time=0.100, loss_ctc=107.156, loss_att=63.374, acc=0.354, loss=76.509, backward_time=0.054, grad_norm=160.569, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=5.114e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:06:58,882 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:07:18,412 (trainer:732) INFO: 17epoch:train:359-716batch: iter_time=9.302e-04, forward_time=0.101, loss_ctc=114.743, loss_att=67.837, acc=0.352, loss=81.909, backward_time=0.054, grad_norm=176.358, clip=100.000, loss_scale=301.050, optim_step_time=0.033, optim0_lr0=5.106e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:08:13,090 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:08:38,648 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:08:50,075 (trainer:732) INFO: 17epoch:train:717-1074batch: iter_time=6.289e-04, forward_time=0.101, loss_ctc=113.904, loss_att=68.016, acc=0.351, loss=81.783, backward_time=0.053, grad_norm=195.357, clip=100.000, loss_scale=408.739, optim_step_time=0.033, optim0_lr0=5.098e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:10:21,857 (trainer:732) INFO: 17epoch:train:1075-1432batch: iter_time=0.002, forward_time=0.101, loss_ctc=108.689, loss_att=64.704, acc=0.357, loss=77.900, backward_time=0.053, grad_norm=162.197, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=5.090e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:11:53,632 (trainer:732) INFO: 17epoch:train:1433-1790batch: iter_time=0.002, forward_time=0.101, loss_ctc=115.014, loss_att=68.569, acc=0.354, loss=82.503, backward_time=0.052, grad_norm=182.139, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=5.082e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:13:24,178 (trainer:732) INFO: 17epoch:train:1791-2148batch: iter_time=0.002, forward_time=0.099, loss_ctc=110.547, loss_att=65.661, acc=0.363, loss=79.127, backward_time=0.053, grad_norm=173.392, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=5.075e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:14:55,238 (trainer:732) INFO: 17epoch:train:2149-2506batch: iter_time=0.002, forward_time=0.100, loss_ctc=106.488, loss_att=62.973, acc=0.367, loss=76.027, backward_time=0.054, grad_norm=165.178, clip=100.000, loss_scale=256.000, optim_step_time=0.034, optim0_lr0=5.067e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:16:28,340 (trainer:732) INFO: 17epoch:train:2507-2864batch: iter_time=0.002, forward_time=0.102, loss_ctc=113.406, loss_att=67.344, acc=0.365, loss=81.163, backward_time=0.053, grad_norm=171.712, clip=100.000, loss_scale=256.000, optim_step_time=0.034, optim0_lr0=5.059e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:17:59,737 (trainer:732) INFO: 17epoch:train:2865-3222batch: iter_time=0.002, forward_time=0.101, loss_ctc=110.077, loss_att=65.438, acc=0.367, loss=78.829, backward_time=0.053, grad_norm=175.499, clip=100.000, loss_scale=464.804, optim_step_time=0.033, optim0_lr0=5.051e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:19:30,833 (trainer:732) INFO: 17epoch:train:3223-3580batch: iter_time=0.003, forward_time=0.100, loss_ctc=104.538, loss_att=61.716, acc=0.376, loss=74.563, backward_time=0.054, grad_norm=157.656, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=5.044e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:21:02,723 (trainer:732) INFO: 17epoch:train:3581-3938batch: iter_time=0.005, forward_time=0.100, loss_ctc=99.507, loss_att=59.158, acc=0.377, loss=71.263, backward_time=0.053, grad_norm=157.094, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=5.036e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:22:34,944 (trainer:732) INFO: 17epoch:train:3939-4296batch: iter_time=0.005, forward_time=0.100, loss_ctc=105.072, loss_att=62.133, acc=0.380, loss=75.015, backward_time=0.053, grad_norm=173.464, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=5.029e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:24:07,196 (trainer:732) INFO: 17epoch:train:4297-4654batch: iter_time=0.005, forward_time=0.100, loss_ctc=106.371, loss_att=62.901, acc=0.380, loss=75.942, backward_time=0.053, grad_norm=167.455, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=5.021e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:25:24,885 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:25:39,377 (trainer:732) INFO: 17epoch:train:4655-5012batch: iter_time=0.004, forward_time=0.100, loss_ctc=107.217, loss_att=63.707, acc=0.382, loss=76.760, backward_time=0.054, grad_norm=181.570, clip=100.000, loss_scale=547.854, optim_step_time=0.033, optim0_lr0=5.013e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:27:12,118 (trainer:732) INFO: 17epoch:train:5013-5370batch: iter_time=0.005, forward_time=0.101, loss_ctc=106.003, loss_att=62.749, acc=0.386, loss=75.725, backward_time=0.054, grad_norm=174.868, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=5.006e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:28:44,928 (trainer:732) INFO: 17epoch:train:5371-5728batch: iter_time=0.003, forward_time=0.104, loss_ctc=102.108, loss_att=60.722, acc=0.386, loss=73.138, backward_time=0.054, grad_norm=172.511, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.998e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:28:54,671 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:29:14,821 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:30:19,126 (trainer:732) INFO: 17epoch:train:5729-6086batch: iter_time=0.008, forward_time=0.103, loss_ctc=101.348, loss_att=59.706, acc=0.395, loss=72.199, backward_time=0.054, grad_norm=168.772, clip=100.000, loss_scale=337.748, optim_step_time=0.033, optim0_lr0=4.991e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:31:53,559 (trainer:732) INFO: 17epoch:train:6087-6444batch: iter_time=0.009, forward_time=0.101, loss_ctc=101.216, loss_att=59.784, acc=0.395, loss=72.213, backward_time=0.054, grad_norm=167.462, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=4.984e-05, train_time=0.264 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:33:26,307 (trainer:732) INFO: 17epoch:train:6445-6802batch: iter_time=0.005, forward_time=0.100, loss_ctc=105.352, loss_att=62.419, acc=0.395, loss=75.299, backward_time=0.054, grad_norm=171.411, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=4.976e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:34:59,107 (trainer:732) INFO: 17epoch:train:6803-7160batch: iter_time=0.003, forward_time=0.101, loss_ctc=106.647, loss_att=63.041, acc=0.397, loss=76.123, backward_time=0.054, grad_norm=168.833, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=4.969e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:36:07,859 (trainer:338) INFO: 17epoch results: [train] iter_time=0.004, forward_time=0.101, loss_ctc=107.185, loss_att=63.545, acc=0.374, loss=76.637, backward_time=0.053, grad_norm=171.161, clip=100.000, loss_scale=371.769, optim_step_time=0.033, optim0_lr0=5.041e-05, train_time=0.257, time=30 minutes and 44.73 seconds, total_count=121737, gpu_max_cached_mem_GB=28.451, [valid] loss_ctc=82.195, cer_ctc=0.603, loss_att=47.116, acc=0.504, cer=0.410, wer=0.998, loss=57.639, time=14.77 seconds, total_count=901, gpu_max_cached_mem_GB=28.451, [att_plot] time=53.25 seconds, total_count=0, gpu_max_cached_mem_GB=28.451 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:36:11,639 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:36:11,654 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/7epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:36:11,654 (trainer:272) INFO: 18/100epoch started. Estimated time to finish: 1 day, 20 hours and 54 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:37:43,863 (trainer:732) INFO: 18epoch:train:1-358batch: iter_time=0.002, forward_time=0.101, loss_ctc=104.861, loss_att=61.681, acc=0.405, loss=74.635, backward_time=0.053, grad_norm=176.850, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=4.962e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:39:15,041 (trainer:732) INFO: 18epoch:train:359-716batch: iter_time=6.796e-04, forward_time=0.101, loss_ctc=102.849, loss_att=60.364, acc=0.408, loss=73.109, backward_time=0.053, grad_norm=163.814, clip=100.000, loss_scale=280.313, optim_step_time=0.033, optim0_lr0=4.954e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:40:03,159 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:40:46,110 (trainer:732) INFO: 18epoch:train:717-1074batch: iter_time=0.002, forward_time=0.100, loss_ctc=97.904, loss_att=57.203, acc=0.415, loss=69.413, backward_time=0.053, grad_norm=162.506, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.947e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:42:17,589 (trainer:732) INFO: 18epoch:train:1075-1432batch: iter_time=0.002, forward_time=0.100, loss_ctc=96.981, loss_att=56.871, acc=0.417, loss=68.904, backward_time=0.054, grad_norm=162.350, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.940e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:42:34,897 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:43:50,164 (trainer:732) INFO: 18epoch:train:1433-1790batch: iter_time=0.002, forward_time=0.102, loss_ctc=100.026, loss_att=58.703, acc=0.421, loss=71.100, backward_time=0.054, grad_norm=177.446, clip=100.000, loss_scale=304.045, optim_step_time=0.033, optim0_lr0=4.933e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:45:21,082 (trainer:732) INFO: 18epoch:train:1791-2148batch: iter_time=0.001, forward_time=0.101, loss_ctc=95.292, loss_att=55.866, acc=0.423, loss=67.694, backward_time=0.054, grad_norm=171.949, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=4.926e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:46:53,242 (trainer:732) INFO: 18epoch:train:2149-2506batch: iter_time=0.003, forward_time=0.101, loss_ctc=99.241, loss_att=57.881, acc=0.424, loss=70.289, backward_time=0.053, grad_norm=160.671, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=4.918e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:48:25,588 (trainer:732) INFO: 18epoch:train:2507-2864batch: iter_time=0.003, forward_time=0.101, loss_ctc=98.748, loss_att=57.686, acc=0.425, loss=70.005, backward_time=0.053, grad_norm=169.221, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=4.911e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:49:58,562 (trainer:732) INFO: 18epoch:train:2865-3222batch: iter_time=0.005, forward_time=0.101, loss_ctc=98.398, loss_att=57.592, acc=0.429, loss=69.834, backward_time=0.053, grad_norm=179.626, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=4.904e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:51:31,076 (trainer:732) INFO: 18epoch:train:3223-3580batch: iter_time=0.006, forward_time=0.100, loss_ctc=96.715, loss_att=56.205, acc=0.434, loss=68.358, backward_time=0.053, grad_norm=164.330, clip=100.000, loss_scale=313.207, optim_step_time=0.033, optim0_lr0=4.897e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:53:03,877 (trainer:732) INFO: 18epoch:train:3581-3938batch: iter_time=0.003, forward_time=0.103, loss_ctc=95.987, loss_att=55.915, acc=0.438, loss=67.937, backward_time=0.053, grad_norm=172.064, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.890e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:53:13,937 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:54:36,840 (trainer:732) INFO: 18epoch:train:3939-4296batch: iter_time=0.002, forward_time=0.104, loss_ctc=99.200, loss_att=57.776, acc=0.439, loss=70.203, backward_time=0.053, grad_norm=176.646, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.883e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:56:10,524 (trainer:732) INFO: 18epoch:train:4297-4654batch: iter_time=0.004, forward_time=0.105, loss_ctc=94.939, loss_att=55.458, acc=0.442, loss=67.302, backward_time=0.053, grad_norm=173.635, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.876e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:56:39,433 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:57:44,292 (trainer:732) INFO: 18epoch:train:4655-5012batch: iter_time=0.005, forward_time=0.103, loss_ctc=94.569, loss_att=55.031, acc=0.446, loss=66.892, backward_time=0.054, grad_norm=170.778, clip=100.000, loss_scale=332.728, optim_step_time=0.033, optim0_lr0=4.870e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 02:59:16,300 (trainer:732) INFO: 18epoch:train:5013-5370batch: iter_time=0.004, forward_time=0.101, loss_ctc=97.158, loss_att=56.553, acc=0.446, loss=68.735, backward_time=0.053, grad_norm=174.025, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=4.863e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:00:49,828 (trainer:732) INFO: 18epoch:train:5371-5728batch: iter_time=0.007, forward_time=0.101, loss_ctc=93.377, loss_att=54.177, acc=0.452, loss=65.937, backward_time=0.054, grad_norm=168.238, clip=100.000, loss_scale=256.000, optim_step_time=0.034, optim0_lr0=4.856e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:02:23,234 (trainer:732) INFO: 18epoch:train:5729-6086batch: iter_time=0.006, forward_time=0.101, loss_ctc=94.935, loss_att=55.179, acc=0.453, loss=67.106, backward_time=0.054, grad_norm=181.872, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=4.849e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:03:55,657 (trainer:732) INFO: 18epoch:train:6087-6444batch: iter_time=0.005, forward_time=0.100, loss_ctc=94.391, loss_att=54.558, acc=0.457, loss=66.508, backward_time=0.053, grad_norm=175.697, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=4.842e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:05:05,392 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:05:29,551 (trainer:732) INFO: 18epoch:train:6445-6802batch: iter_time=0.008, forward_time=0.100, loss_ctc=93.283, loss_att=54.156, acc=0.459, loss=65.894, backward_time=0.054, grad_norm=177.327, clip=100.000, loss_scale=284.603, optim_step_time=0.033, optim0_lr0=4.835e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:07:03,392 (trainer:732) INFO: 18epoch:train:6803-7160batch: iter_time=0.006, forward_time=0.101, loss_ctc=93.532, loss_att=54.117, acc=0.461, loss=65.941, backward_time=0.056, grad_norm=176.913, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.829e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:08:12,054 (trainer:338) INFO: 18epoch results: [train] iter_time=0.004, forward_time=0.101, loss_ctc=97.098, loss_att=56.636, acc=0.435, loss=68.775, backward_time=0.054, grad_norm=171.804, clip=100.000, loss_scale=344.575, optim_step_time=0.033, optim0_lr0=4.894e-05, train_time=0.258, time=30 minutes and 52.44 seconds, total_count=128898, gpu_max_cached_mem_GB=28.451, [valid] loss_ctc=70.595, cer_ctc=0.471, loss_att=39.103, acc=0.588, cer=0.322, wer=0.997, loss=48.551, time=14.77 seconds, total_count=954, gpu_max_cached_mem_GB=28.451, [att_plot] time=53.18 seconds, total_count=0, gpu_max_cached_mem_GB=28.451 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:08:15,619 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:08:15,634 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/8epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:08:15,634 (trainer:272) INFO: 19/100epoch started. Estimated time to finish: 1 day, 20 hours and 19 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:09:46,264 (trainer:732) INFO: 19epoch:train:1-358batch: iter_time=0.003, forward_time=0.099, loss_ctc=89.435, loss_att=51.337, acc=0.471, loss=62.766, backward_time=0.053, grad_norm=174.773, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.822e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:11:17,883 (trainer:732) INFO: 19epoch:train:359-716batch: iter_time=0.004, forward_time=0.100, loss_ctc=88.513, loss_att=51.003, acc=0.473, loss=62.256, backward_time=0.052, grad_norm=173.073, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.815e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:12:48,265 (trainer:732) INFO: 19epoch:train:717-1074batch: iter_time=8.311e-04, forward_time=0.099, loss_ctc=92.381, loss_att=53.080, acc=0.473, loss=64.871, backward_time=0.053, grad_norm=172.796, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.809e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:14:19,132 (trainer:732) INFO: 19epoch:train:1075-1432batch: iter_time=0.001, forward_time=0.099, loss_ctc=90.302, loss_att=51.802, acc=0.478, loss=63.352, backward_time=0.053, grad_norm=173.447, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.802e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:15:21,698 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:15:50,031 (trainer:732) INFO: 19epoch:train:1433-1790batch: iter_time=8.921e-04, forward_time=0.100, loss_ctc=93.820, loss_att=54.012, acc=0.475, loss=65.954, backward_time=0.053, grad_norm=176.798, clip=100.000, loss_scale=623.866, optim_step_time=0.033, optim0_lr0=4.795e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:17:19,981 (trainer:732) INFO: 19epoch:train:1791-2148batch: iter_time=0.002, forward_time=0.098, loss_ctc=89.012, loss_att=50.816, acc=0.484, loss=62.275, backward_time=0.053, grad_norm=165.940, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.789e-05, train_time=0.251 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:17:47,297 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:17:47,942 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:18:51,138 (trainer:732) INFO: 19epoch:train:2149-2506batch: iter_time=0.003, forward_time=0.099, loss_ctc=91.810, loss_att=52.819, acc=0.479, loss=64.516, backward_time=0.053, grad_norm=178.545, clip=100.000, loss_scale=333.445, optim_step_time=0.033, optim0_lr0=4.782e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:20:23,017 (trainer:732) INFO: 19epoch:train:2507-2864batch: iter_time=0.006, forward_time=0.098, loss_ctc=86.364, loss_att=49.510, acc=0.490, loss=60.566, backward_time=0.053, grad_norm=179.480, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=4.776e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:21:56,624 (trainer:732) INFO: 19epoch:train:2865-3222batch: iter_time=0.006, forward_time=0.101, loss_ctc=88.815, loss_att=50.706, acc=0.492, loss=62.139, backward_time=0.054, grad_norm=176.221, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=4.769e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:23:28,334 (trainer:732) INFO: 19epoch:train:3223-3580batch: iter_time=0.003, forward_time=0.099, loss_ctc=87.607, loss_att=49.965, acc=0.495, loss=61.258, backward_time=0.055, grad_norm=171.496, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=4.763e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:25:02,226 (trainer:732) INFO: 19epoch:train:3581-3938batch: iter_time=0.007, forward_time=0.100, loss_ctc=87.254, loss_att=49.714, acc=0.496, loss=60.976, backward_time=0.055, grad_norm=171.715, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=4.757e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:26:35,292 (trainer:732) INFO: 19epoch:train:3939-4296batch: iter_time=0.005, forward_time=0.101, loss_ctc=88.511, loss_att=50.751, acc=0.495, loss=62.079, backward_time=0.053, grad_norm=176.882, clip=100.000, loss_scale=283.888, optim_step_time=0.033, optim0_lr0=4.750e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:28:06,481 (trainer:732) INFO: 19epoch:train:4297-4654batch: iter_time=0.004, forward_time=0.098, loss_ctc=90.244, loss_att=51.734, acc=0.496, loss=63.287, backward_time=0.053, grad_norm=183.346, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.744e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:29:40,895 (trainer:732) INFO: 19epoch:train:4655-5012batch: iter_time=0.007, forward_time=0.101, loss_ctc=91.139, loss_att=52.214, acc=0.498, loss=63.892, backward_time=0.053, grad_norm=177.759, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.737e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:30:40,479 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:31:13,358 (trainer:732) INFO: 19epoch:train:5013-5370batch: iter_time=0.007, forward_time=0.098, loss_ctc=85.440, loss_att=48.710, acc=0.507, loss=59.729, backward_time=0.053, grad_norm=170.792, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.731e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:32:46,930 (trainer:732) INFO: 19epoch:train:5371-5728batch: iter_time=0.007, forward_time=0.100, loss_ctc=85.485, loss_att=48.764, acc=0.504, loss=59.780, backward_time=0.053, grad_norm=170.655, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.725e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:34:20,696 (trainer:732) INFO: 19epoch:train:5729-6086batch: iter_time=0.008, forward_time=0.100, loss_ctc=84.742, loss_att=48.076, acc=0.512, loss=59.076, backward_time=0.054, grad_norm=176.646, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.718e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:35:07,509 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:35:53,757 (trainer:732) INFO: 19epoch:train:6087-6444batch: iter_time=0.007, forward_time=0.099, loss_ctc=87.237, loss_att=49.672, acc=0.512, loss=60.942, backward_time=0.054, grad_norm=181.860, clip=100.000, loss_scale=523.473, optim_step_time=0.033, optim0_lr0=4.712e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:37:25,777 (trainer:732) INFO: 19epoch:train:6445-6802batch: iter_time=0.004, forward_time=0.099, loss_ctc=87.086, loss_att=49.585, acc=0.515, loss=60.835, backward_time=0.055, grad_norm=180.377, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.706e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:38:20,662 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:38:58,723 (trainer:732) INFO: 19epoch:train:6803-7160batch: iter_time=0.009, forward_time=0.099, loss_ctc=84.154, loss_att=47.540, acc=0.516, loss=58.524, backward_time=0.053, grad_norm=164.630, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.700e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:40:06,658 (trainer:338) INFO: 19epoch results: [train] iter_time=0.005, forward_time=0.099, loss_ctc=88.427, loss_att=50.566, acc=0.493, loss=61.924, backward_time=0.054, grad_norm=174.855, clip=100.000, loss_scale=446.623, optim_step_time=0.033, optim0_lr0=4.760e-05, train_time=0.257, time=30 minutes and 43.73 seconds, total_count=136059, gpu_max_cached_mem_GB=28.451, [valid] loss_ctc=62.404, cer_ctc=0.402, loss_att=33.583, acc=0.650, cer=0.266, wer=0.991, loss=42.230, time=14.4 seconds, total_count=1007, gpu_max_cached_mem_GB=28.451, [att_plot] time=52.89 seconds, total_count=0, gpu_max_cached_mem_GB=28.451 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:40:10,415 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:40:10,432 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/9epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:40:10,432 (trainer:272) INFO: 20/100epoch started. Estimated time to finish: 1 day, 19 hours and 45 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:41:43,504 (trainer:732) INFO: 20epoch:train:1-358batch: iter_time=0.003, forward_time=0.103, loss_ctc=86.521, loss_att=48.918, acc=0.521, loss=60.199, backward_time=0.053, grad_norm=179.383, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.694e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:43:14,280 (trainer:732) INFO: 20epoch:train:359-716batch: iter_time=3.192e-04, forward_time=0.102, loss_ctc=84.161, loss_att=47.412, acc=0.523, loss=58.437, backward_time=0.053, grad_norm=174.878, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.687e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:44:46,662 (trainer:732) INFO: 20epoch:train:717-1074batch: iter_time=7.587e-04, forward_time=0.104, loss_ctc=84.999, loss_att=47.875, acc=0.525, loss=59.012, backward_time=0.054, grad_norm=169.380, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.681e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:45:41,811 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:46:18,584 (trainer:732) INFO: 20epoch:train:1075-1432batch: iter_time=0.002, forward_time=0.102, loss_ctc=80.386, loss_att=45.182, acc=0.530, loss=55.743, backward_time=0.055, grad_norm=167.277, clip=100.000, loss_scale=775.888, optim_step_time=0.033, optim0_lr0=4.675e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:47:50,396 (trainer:732) INFO: 20epoch:train:1433-1790batch: iter_time=4.114e-04, forward_time=0.103, loss_ctc=84.279, loss_att=47.653, acc=0.526, loss=58.641, backward_time=0.053, grad_norm=177.600, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.669e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:49:23,723 (trainer:732) INFO: 20epoch:train:1791-2148batch: iter_time=0.002, forward_time=0.105, loss_ctc=80.638, loss_att=45.399, acc=0.534, loss=55.971, backward_time=0.053, grad_norm=171.942, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.663e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:50:19,011 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:50:55,858 (trainer:732) INFO: 20epoch:train:2149-2506batch: iter_time=9.425e-04, forward_time=0.103, loss_ctc=81.955, loss_att=46.349, acc=0.532, loss=57.031, backward_time=0.053, grad_norm=173.295, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.657e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:52:27,707 (trainer:732) INFO: 20epoch:train:2507-2864batch: iter_time=4.758e-04, forward_time=0.103, loss_ctc=86.251, loss_att=48.670, acc=0.535, loss=59.944, backward_time=0.054, grad_norm=181.335, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.651e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:54:00,116 (trainer:732) INFO: 20epoch:train:2865-3222batch: iter_time=0.004, forward_time=0.102, loss_ctc=75.602, loss_att=42.405, acc=0.544, loss=52.364, backward_time=0.057, grad_norm=171.949, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.645e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:55:34,138 (trainer:732) INFO: 20epoch:train:3223-3580batch: iter_time=0.003, forward_time=0.105, loss_ctc=83.500, loss_att=46.785, acc=0.540, loss=57.799, backward_time=0.053, grad_norm=172.418, clip=100.000, loss_scale=926.749, optim_step_time=0.033, optim0_lr0=4.639e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:55:38,622 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:57:07,101 (trainer:732) INFO: 20epoch:train:3581-3938batch: iter_time=0.002, forward_time=0.104, loss_ctc=81.685, loss_att=45.857, acc=0.544, loss=56.605, backward_time=0.054, grad_norm=176.658, clip=100.000, loss_scale=537.815, optim_step_time=0.033, optim0_lr0=4.633e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:57:49,552 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 03:58:40,813 (trainer:732) INFO: 20epoch:train:3939-4296batch: iter_time=0.003, forward_time=0.104, loss_ctc=80.386, loss_att=45.008, acc=0.545, loss=55.621, backward_time=0.054, grad_norm=174.429, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.627e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:00:12,976 (trainer:732) INFO: 20epoch:train:4297-4654batch: iter_time=0.003, forward_time=0.102, loss_ctc=76.513, loss_att=42.750, acc=0.549, loss=52.879, backward_time=0.054, grad_norm=172.322, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.621e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:01:44,260 (trainer:732) INFO: 20epoch:train:4655-5012batch: iter_time=0.001, forward_time=0.102, loss_ctc=78.693, loss_att=43.921, acc=0.552, loss=54.353, backward_time=0.053, grad_norm=171.849, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.615e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:03:16,574 (trainer:732) INFO: 20epoch:train:5013-5370batch: iter_time=0.001, forward_time=0.103, loss_ctc=83.949, loss_att=47.350, acc=0.546, loss=58.329, backward_time=0.053, grad_norm=180.464, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.610e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:04:50,258 (trainer:732) INFO: 20epoch:train:5371-5728batch: iter_time=0.007, forward_time=0.102, loss_ctc=80.232, loss_att=44.775, acc=0.554, loss=55.412, backward_time=0.053, grad_norm=175.647, clip=100.000, loss_scale=696.492, optim_step_time=0.033, optim0_lr0=4.604e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:04:51,113 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:06:23,576 (trainer:732) INFO: 20epoch:train:5729-6086batch: iter_time=0.002, forward_time=0.105, loss_ctc=80.042, loss_att=44.746, acc=0.558, loss=55.335, backward_time=0.053, grad_norm=180.551, clip=100.000, loss_scale=514.868, optim_step_time=0.033, optim0_lr0=4.598e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:07:24,166 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:07:55,728 (trainer:732) INFO: 20epoch:train:6087-6444batch: iter_time=0.002, forward_time=0.103, loss_ctc=81.929, loss_att=45.786, acc=0.556, loss=56.629, backward_time=0.053, grad_norm=181.181, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.592e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:09:28,270 (trainer:732) INFO: 20epoch:train:6445-6802batch: iter_time=0.003, forward_time=0.102, loss_ctc=80.500, loss_att=44.952, acc=0.560, loss=55.616, backward_time=0.053, grad_norm=178.913, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.586e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:11:01,083 (trainer:732) INFO: 20epoch:train:6803-7160batch: iter_time=0.003, forward_time=0.103, loss_ctc=78.276, loss_att=43.835, acc=0.561, loss=54.167, backward_time=0.053, grad_norm=175.444, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.581e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:12:09,714 (trainer:338) INFO: 20epoch results: [train] iter_time=0.002, forward_time=0.103, loss_ctc=81.468, loss_att=45.748, acc=0.542, loss=56.464, backward_time=0.054, grad_norm=175.343, clip=100.000, loss_scale=556.562, optim_step_time=0.033, optim0_lr0=4.636e-05, train_time=0.258, time=30 minutes and 51.45 seconds, total_count=143220, gpu_max_cached_mem_GB=28.451, [valid] loss_ctc=55.448, cer_ctc=0.352, loss_att=29.300, acc=0.700, cer=0.221, wer=0.983, loss=37.144, time=14.75 seconds, total_count=1060, gpu_max_cached_mem_GB=28.451, [att_plot] time=53.08 seconds, total_count=0, gpu_max_cached_mem_GB=28.451 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:12:13,393 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:12:13,409 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/10epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:12:13,410 (trainer:272) INFO: 21/100epoch started. Estimated time to finish: 1 day, 19 hours and 11 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:13:45,235 (trainer:732) INFO: 21epoch:train:1-358batch: iter_time=0.003, forward_time=0.101, loss_ctc=78.123, loss_att=43.210, acc=0.566, loss=53.684, backward_time=0.053, grad_norm=175.015, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.575e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:15:15,653 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:15:16,883 (trainer:732) INFO: 21epoch:train:359-716batch: iter_time=0.004, forward_time=0.100, loss_ctc=77.737, loss_att=43.104, acc=0.569, loss=53.494, backward_time=0.053, grad_norm=173.734, clip=100.000, loss_scale=712.784, optim_step_time=0.033, optim0_lr0=4.569e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:16:48,250 (trainer:732) INFO: 21epoch:train:717-1074batch: iter_time=0.001, forward_time=0.101, loss_ctc=78.157, loss_att=43.258, acc=0.571, loss=53.728, backward_time=0.053, grad_norm=181.942, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=4.564e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:18:18,497 (trainer:732) INFO: 21epoch:train:1075-1432batch: iter_time=6.395e-04, forward_time=0.100, loss_ctc=78.718, loss_att=43.398, acc=0.572, loss=53.994, backward_time=0.053, grad_norm=170.452, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.558e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:19:49,792 (trainer:732) INFO: 21epoch:train:1433-1790batch: iter_time=0.002, forward_time=0.100, loss_ctc=76.490, loss_att=42.512, acc=0.572, loss=52.706, backward_time=0.053, grad_norm=175.998, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.552e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:21:21,452 (trainer:732) INFO: 21epoch:train:1791-2148batch: iter_time=0.003, forward_time=0.100, loss_ctc=76.118, loss_att=42.240, acc=0.573, loss=52.403, backward_time=0.053, grad_norm=178.222, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.547e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:22:52,941 (trainer:732) INFO: 21epoch:train:2149-2506batch: iter_time=0.003, forward_time=0.100, loss_ctc=77.597, loss_att=42.903, acc=0.574, loss=53.312, backward_time=0.053, grad_norm=175.809, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.541e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:24:24,452 (trainer:732) INFO: 21epoch:train:2507-2864batch: iter_time=0.002, forward_time=0.100, loss_ctc=79.729, loss_att=44.143, acc=0.577, loss=54.819, backward_time=0.053, grad_norm=178.372, clip=100.000, loss_scale=730.816, optim_step_time=0.033, optim0_lr0=4.535e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:24:27,957 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:25:56,500 (trainer:732) INFO: 21epoch:train:2865-3222batch: iter_time=0.006, forward_time=0.099, loss_ctc=77.242, loss_att=42.672, acc=0.578, loss=53.043, backward_time=0.053, grad_norm=178.911, clip=100.000, loss_scale=530.644, optim_step_time=0.033, optim0_lr0=4.530e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:27:28,034 (trainer:732) INFO: 21epoch:train:3223-3580batch: iter_time=0.003, forward_time=0.100, loss_ctc=76.177, loss_att=42.153, acc=0.579, loss=52.360, backward_time=0.054, grad_norm=177.712, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.524e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:28:59,621 (trainer:732) INFO: 21epoch:train:3581-3938batch: iter_time=0.005, forward_time=0.099, loss_ctc=75.722, loss_att=41.791, acc=0.578, loss=51.970, backward_time=0.053, grad_norm=175.625, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.519e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:30:33,432 (trainer:732) INFO: 21epoch:train:3939-4296batch: iter_time=0.006, forward_time=0.103, loss_ctc=77.086, loss_att=42.726, acc=0.580, loss=53.034, backward_time=0.054, grad_norm=180.446, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.513e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:32:05,700 (trainer:732) INFO: 21epoch:train:4297-4654batch: iter_time=0.006, forward_time=0.099, loss_ctc=74.441, loss_att=41.144, acc=0.583, loss=51.133, backward_time=0.055, grad_norm=178.334, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.508e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:32:41,725 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:33:38,423 (trainer:732) INFO: 21epoch:train:4655-5012batch: iter_time=0.006, forward_time=0.100, loss_ctc=72.901, loss_att=40.184, acc=0.590, loss=49.999, backward_time=0.053, grad_norm=170.772, clip=100.000, loss_scale=703.642, optim_step_time=0.033, optim0_lr0=4.502e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:34:42,431 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:35:09,877 (trainer:732) INFO: 21epoch:train:5013-5370batch: iter_time=0.006, forward_time=0.099, loss_ctc=72.637, loss_att=39.841, acc=0.593, loss=49.680, backward_time=0.054, grad_norm=166.115, clip=100.000, loss_scale=871.978, optim_step_time=0.033, optim0_lr0=4.497e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:36:42,669 (trainer:732) INFO: 21epoch:train:5371-5728batch: iter_time=0.006, forward_time=0.100, loss_ctc=73.959, loss_att=40.681, acc=0.593, loss=50.665, backward_time=0.054, grad_norm=172.505, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.492e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:37:22,966 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:38:14,672 (trainer:732) INFO: 21epoch:train:5729-6086batch: iter_time=0.005, forward_time=0.099, loss_ctc=74.394, loss_att=41.026, acc=0.588, loss=51.036, backward_time=0.053, grad_norm=179.103, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.486e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:39:48,419 (trainer:732) INFO: 21epoch:train:6087-6444batch: iter_time=0.010, forward_time=0.100, loss_ctc=72.517, loss_att=39.769, acc=0.596, loss=49.594, backward_time=0.054, grad_norm=177.369, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.481e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:40:35,376 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:41:21,685 (trainer:732) INFO: 21epoch:train:6445-6802batch: iter_time=0.005, forward_time=0.101, loss_ctc=76.062, loss_att=41.833, acc=0.592, loss=52.102, backward_time=0.053, grad_norm=177.106, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.475e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:42:55,993 (trainer:732) INFO: 21epoch:train:6803-7160batch: iter_time=0.007, forward_time=0.102, loss_ctc=74.831, loss_att=41.120, acc=0.598, loss=51.233, backward_time=0.053, grad_norm=181.237, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.470e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:44:04,541 (trainer:338) INFO: 21epoch results: [train] iter_time=0.004, forward_time=0.100, loss_ctc=75.993, loss_att=41.963, acc=0.581, loss=52.172, backward_time=0.053, grad_norm=176.228, clip=100.000, loss_scale=561.426, optim_step_time=0.033, optim0_lr0=4.522e-05, train_time=0.257, time=30 minutes and 43.23 seconds, total_count=150381, gpu_max_cached_mem_GB=28.451, [valid] loss_ctc=50.574, cer_ctc=0.312, loss_att=26.281, acc=0.735, cer=0.192, wer=0.975, loss=33.569, time=14.62 seconds, total_count=1113, gpu_max_cached_mem_GB=28.451, [att_plot] time=53.28 seconds, total_count=0, gpu_max_cached_mem_GB=28.451 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:44:08,310 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:44:08,327 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/11epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:44:08,327 (trainer:272) INFO: 22/100epoch started. Estimated time to finish: 1 day, 18 hours and 36 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:45:26,029 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:45:26,691 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:45:41,975 (trainer:732) INFO: 22epoch:train:1-358batch: iter_time=0.002, forward_time=0.105, loss_ctc=74.257, loss_att=40.732, acc=0.596, loss=50.790, backward_time=0.053, grad_norm=173.852, clip=100.000, loss_scale=791.664, optim_step_time=0.033, optim0_lr0=4.465e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:47:14,556 (trainer:732) INFO: 22epoch:train:359-716batch: iter_time=0.002, forward_time=0.103, loss_ctc=74.361, loss_att=40.570, acc=0.602, loss=50.708, backward_time=0.054, grad_norm=175.587, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.459e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:48:46,457 (trainer:732) INFO: 22epoch:train:717-1074batch: iter_time=0.001, forward_time=0.101, loss_ctc=74.973, loss_att=40.845, acc=0.603, loss=51.084, backward_time=0.053, grad_norm=182.472, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.454e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:50:18,045 (trainer:732) INFO: 22epoch:train:1075-1432batch: iter_time=0.002, forward_time=0.100, loss_ctc=74.416, loss_att=40.548, acc=0.604, loss=50.709, backward_time=0.054, grad_norm=176.771, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.449e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:51:50,511 (trainer:732) INFO: 22epoch:train:1433-1790batch: iter_time=0.003, forward_time=0.102, loss_ctc=69.406, loss_att=37.729, acc=0.609, loss=47.232, backward_time=0.053, grad_norm=167.445, clip=100.000, loss_scale=512.000, optim_step_time=0.034, optim0_lr0=4.444e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:53:21,850 (trainer:732) INFO: 22epoch:train:1791-2148batch: iter_time=0.002, forward_time=0.100, loss_ctc=70.706, loss_att=38.505, acc=0.609, loss=48.165, backward_time=0.054, grad_norm=173.691, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.438e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:54:54,365 (trainer:732) INFO: 22epoch:train:2149-2506batch: iter_time=0.001, forward_time=0.102, loss_ctc=72.309, loss_att=39.376, acc=0.607, loss=49.256, backward_time=0.054, grad_norm=173.223, clip=100.000, loss_scale=808.045, optim_step_time=0.035, optim0_lr0=4.433e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:55:22,184 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:56:24,959 (trainer:732) INFO: 22epoch:train:2507-2864batch: iter_time=0.003, forward_time=0.099, loss_ctc=71.479, loss_att=38.929, acc=0.612, loss=48.694, backward_time=0.054, grad_norm=173.133, clip=100.000, loss_scale=668.325, optim_step_time=0.033, optim0_lr0=4.428e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:57:57,521 (trainer:732) INFO: 22epoch:train:2865-3222batch: iter_time=0.006, forward_time=0.100, loss_ctc=69.431, loss_att=37.594, acc=0.617, loss=47.145, backward_time=0.053, grad_norm=179.268, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.423e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 04:59:30,545 (trainer:732) INFO: 22epoch:train:3223-3580batch: iter_time=0.006, forward_time=0.101, loss_ctc=68.593, loss_att=37.367, acc=0.613, loss=46.735, backward_time=0.053, grad_norm=176.216, clip=100.000, loss_scale=512.000, optim_step_time=0.034, optim0_lr0=4.418e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:01:02,386 (trainer:732) INFO: 22epoch:train:3581-3938batch: iter_time=0.003, forward_time=0.101, loss_ctc=70.166, loss_att=38.364, acc=0.613, loss=47.904, backward_time=0.053, grad_norm=175.443, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.413e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:02:33,961 (trainer:732) INFO: 22epoch:train:3939-4296batch: iter_time=0.002, forward_time=0.101, loss_ctc=72.486, loss_att=39.551, acc=0.613, loss=49.431, backward_time=0.053, grad_norm=180.766, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.407e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:04:06,904 (trainer:732) INFO: 22epoch:train:4297-4654batch: iter_time=0.005, forward_time=0.100, loss_ctc=71.097, loss_att=38.668, acc=0.613, loss=48.397, backward_time=0.053, grad_norm=178.901, clip=100.000, loss_scale=566.346, optim_step_time=0.033, optim0_lr0=4.402e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:05:25,970 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:05:39,072 (trainer:732) INFO: 22epoch:train:4655-5012batch: iter_time=0.002, forward_time=0.101, loss_ctc=73.918, loss_att=40.267, acc=0.614, loss=50.363, backward_time=0.055, grad_norm=176.588, clip=100.000, loss_scale=949.423, optim_step_time=0.033, optim0_lr0=4.397e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:07:11,393 (trainer:732) INFO: 22epoch:train:5013-5370batch: iter_time=0.004, forward_time=0.100, loss_ctc=69.932, loss_att=38.142, acc=0.615, loss=47.679, backward_time=0.054, grad_norm=173.947, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.392e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:08:43,970 (trainer:732) INFO: 22epoch:train:5371-5728batch: iter_time=0.006, forward_time=0.100, loss_ctc=72.457, loss_att=39.345, acc=0.619, loss=49.279, backward_time=0.053, grad_norm=181.127, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.387e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:10:06,835 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:10:17,230 (trainer:732) INFO: 22epoch:train:5729-6086batch: iter_time=0.007, forward_time=0.100, loss_ctc=68.784, loss_att=37.507, acc=0.621, loss=46.890, backward_time=0.054, grad_norm=178.771, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.382e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:11:51,140 (trainer:732) INFO: 22epoch:train:6087-6444batch: iter_time=0.007, forward_time=0.101, loss_ctc=70.821, loss_att=38.383, acc=0.625, loss=48.115, backward_time=0.055, grad_norm=178.983, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.377e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:13:22,665 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:13:24,321 (trainer:732) INFO: 22epoch:train:6445-6802batch: iter_time=0.006, forward_time=0.101, loss_ctc=70.694, loss_att=38.434, acc=0.622, loss=48.112, backward_time=0.053, grad_norm=177.280, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.372e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:14:20,529 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:14:58,143 (trainer:732) INFO: 22epoch:train:6803-7160batch: iter_time=0.006, forward_time=0.101, loss_ctc=71.217, loss_att=38.462, acc=0.626, loss=48.288, backward_time=0.053, grad_norm=174.425, clip=100.000, loss_scale=596.616, optim_step_time=0.034, optim0_lr0=4.367e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:16:06,866 (trainer:338) INFO: 22epoch results: [train] iter_time=0.004, forward_time=0.101, loss_ctc=71.550, loss_att=38.952, acc=0.613, loss=48.732, backward_time=0.054, grad_norm=176.399, clip=100.000, loss_scale=577.315, optim_step_time=0.033, optim0_lr0=4.415e-05, train_time=0.258, time=30 minutes and 50.63 seconds, total_count=157542, gpu_max_cached_mem_GB=28.451, [valid] loss_ctc=46.249, cer_ctc=0.288, loss_att=23.743, acc=0.763, cer=0.168, wer=0.963, loss=30.495, time=14.41 seconds, total_count=1166, gpu_max_cached_mem_GB=28.451, [att_plot] time=53.5 seconds, total_count=0, gpu_max_cached_mem_GB=28.451 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:16:10,427 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:16:10,445 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/12epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:16:10,445 (trainer:272) INFO: 23/100epoch started. Estimated time to finish: 1 day, 18 hours and 3 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:17:41,453 (trainer:732) INFO: 23epoch:train:1-358batch: iter_time=0.003, forward_time=0.099, loss_ctc=68.170, loss_att=36.778, acc=0.626, loss=46.196, backward_time=0.053, grad_norm=173.216, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.362e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:19:12,299 (trainer:732) INFO: 23epoch:train:359-716batch: iter_time=0.002, forward_time=0.099, loss_ctc=67.728, loss_att=36.496, acc=0.632, loss=45.865, backward_time=0.054, grad_norm=176.587, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.357e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:20:44,011 (trainer:732) INFO: 23epoch:train:717-1074batch: iter_time=0.005, forward_time=0.099, loss_ctc=67.548, loss_att=36.360, acc=0.631, loss=45.716, backward_time=0.053, grad_norm=173.325, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.352e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:21:48,189 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:22:15,068 (trainer:732) INFO: 23epoch:train:1075-1432batch: iter_time=0.002, forward_time=0.100, loss_ctc=68.001, loss_att=36.660, acc=0.633, loss=46.063, backward_time=0.053, grad_norm=175.604, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.347e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:23:45,956 (trainer:732) INFO: 23epoch:train:1433-1790batch: iter_time=0.002, forward_time=0.099, loss_ctc=69.147, loss_att=37.354, acc=0.630, loss=46.892, backward_time=0.054, grad_norm=173.544, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.343e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:25:05,309 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:25:17,300 (trainer:732) INFO: 23epoch:train:1791-2148batch: iter_time=0.002, forward_time=0.100, loss_ctc=68.924, loss_att=37.186, acc=0.632, loss=46.707, backward_time=0.054, grad_norm=172.739, clip=100.000, loss_scale=859.070, optim_step_time=0.033, optim0_lr0=4.338e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:25:47,273 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:26:48,517 (trainer:732) INFO: 23epoch:train:2149-2506batch: iter_time=0.003, forward_time=0.099, loss_ctc=70.130, loss_att=37.899, acc=0.631, loss=47.568, backward_time=0.054, grad_norm=182.929, clip=100.000, loss_scale=340.616, optim_step_time=0.033, optim0_lr0=4.333e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:28:20,555 (trainer:732) INFO: 23epoch:train:2507-2864batch: iter_time=7.007e-04, forward_time=0.101, loss_ctc=73.721, loss_att=39.943, acc=0.628, loss=50.076, backward_time=0.053, grad_norm=185.186, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=4.328e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:29:52,843 (trainer:732) INFO: 23epoch:train:2865-3222batch: iter_time=0.005, forward_time=0.099, loss_ctc=70.815, loss_att=38.015, acc=0.639, loss=47.855, backward_time=0.053, grad_norm=175.077, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=4.323e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:31:26,018 (trainer:732) INFO: 23epoch:train:3223-3580batch: iter_time=0.003, forward_time=0.101, loss_ctc=71.668, loss_att=38.512, acc=0.635, loss=48.459, backward_time=0.054, grad_norm=181.822, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=4.318e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:32:56,762 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:32:58,547 (trainer:732) INFO: 23epoch:train:3581-3938batch: iter_time=0.006, forward_time=0.100, loss_ctc=67.309, loss_att=36.179, acc=0.638, loss=45.518, backward_time=0.053, grad_norm=176.867, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=4.314e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:34:30,201 (trainer:732) INFO: 23epoch:train:3939-4296batch: iter_time=0.004, forward_time=0.099, loss_ctc=67.503, loss_att=36.328, acc=0.638, loss=45.681, backward_time=0.054, grad_norm=179.274, clip=100.000, loss_scale=276.737, optim_step_time=0.033, optim0_lr0=4.309e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:34:32,845 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:36:02,304 (trainer:732) INFO: 23epoch:train:4297-4654batch: iter_time=0.005, forward_time=0.099, loss_ctc=66.850, loss_att=36.028, acc=0.640, loss=45.274, backward_time=0.053, grad_norm=175.836, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.304e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:37:35,047 (trainer:732) INFO: 23epoch:train:4655-5012batch: iter_time=0.004, forward_time=0.102, loss_ctc=67.412, loss_att=35.937, acc=0.643, loss=45.379, backward_time=0.053, grad_norm=175.085, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.299e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:39:08,322 (trainer:732) INFO: 23epoch:train:5013-5370batch: iter_time=0.005, forward_time=0.102, loss_ctc=66.190, loss_att=35.529, acc=0.644, loss=44.728, backward_time=0.054, grad_norm=175.594, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.295e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:40:40,784 (trainer:732) INFO: 23epoch:train:5371-5728batch: iter_time=0.003, forward_time=0.103, loss_ctc=67.225, loss_att=35.884, acc=0.647, loss=45.286, backward_time=0.053, grad_norm=179.361, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.290e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:42:13,117 (trainer:732) INFO: 23epoch:train:5729-6086batch: iter_time=0.005, forward_time=0.100, loss_ctc=65.341, loss_att=35.123, acc=0.643, loss=44.188, backward_time=0.054, grad_norm=174.191, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.285e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:43:12,180 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:43:46,173 (trainer:732) INFO: 23epoch:train:6087-6444batch: iter_time=0.006, forward_time=0.101, loss_ctc=65.325, loss_att=35.052, acc=0.647, loss=44.134, backward_time=0.054, grad_norm=173.455, clip=100.000, loss_scale=572.235, optim_step_time=0.033, optim0_lr0=4.280e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:45:17,641 (trainer:732) INFO: 23epoch:train:6445-6802batch: iter_time=0.005, forward_time=0.099, loss_ctc=65.445, loss_att=35.061, acc=0.648, loss=44.176, backward_time=0.054, grad_norm=173.841, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.276e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:46:50,470 (trainer:732) INFO: 23epoch:train:6803-7160batch: iter_time=0.008, forward_time=0.099, loss_ctc=65.845, loss_att=35.179, acc=0.650, loss=44.379, backward_time=0.053, grad_norm=172.866, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.271e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:47:59,289 (trainer:338) INFO: 23epoch results: [train] iter_time=0.004, forward_time=0.100, loss_ctc=67.988, loss_att=36.560, acc=0.638, loss=45.988, backward_time=0.053, grad_norm=176.344, clip=100.000, loss_scale=460.786, optim_step_time=0.033, optim0_lr0=4.316e-05, train_time=0.257, time=30 minutes and 40.73 seconds, total_count=164703, gpu_max_cached_mem_GB=28.451, [valid] loss_ctc=43.316, cer_ctc=0.261, loss_att=21.875, acc=0.783, cer=0.152, wer=0.949, loss=28.307, time=14.64 seconds, total_count=1219, gpu_max_cached_mem_GB=28.451, [att_plot] time=53.47 seconds, total_count=0, gpu_max_cached_mem_GB=28.451 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:48:03,111 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:48:03,129 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/13epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:48:03,129 (trainer:272) INFO: 24/100epoch started. Estimated time to finish: 1 day, 17 hours and 28 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:49:35,142 (trainer:732) INFO: 24epoch:train:1-358batch: iter_time=0.003, forward_time=0.101, loss_ctc=67.349, loss_att=35.988, acc=0.650, loss=45.396, backward_time=0.052, grad_norm=173.874, clip=100.000, loss_scale=512.000, optim_step_time=0.034, optim0_lr0=4.266e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:51:07,042 (trainer:732) INFO: 24epoch:train:359-716batch: iter_time=0.001, forward_time=0.102, loss_ctc=65.923, loss_att=35.233, acc=0.651, loss=44.440, backward_time=0.053, grad_norm=176.162, clip=100.000, loss_scale=512.000, optim_step_time=0.034, optim0_lr0=4.262e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:52:40,196 (trainer:732) INFO: 24epoch:train:717-1074batch: iter_time=2.444e-04, forward_time=0.104, loss_ctc=68.373, loss_att=36.632, acc=0.649, loss=46.154, backward_time=0.053, grad_norm=178.797, clip=100.000, loss_scale=512.000, optim_step_time=0.034, optim0_lr0=4.257e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:53:19,407 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:54:11,866 (trainer:732) INFO: 24epoch:train:1075-1432batch: iter_time=5.329e-04, forward_time=0.101, loss_ctc=67.629, loss_att=35.966, acc=0.655, loss=45.465, backward_time=0.053, grad_norm=177.888, clip=100.000, loss_scale=622.431, optim_step_time=0.033, optim0_lr0=4.253e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:55:44,256 (trainer:732) INFO: 24epoch:train:1433-1790batch: iter_time=0.004, forward_time=0.100, loss_ctc=68.660, loss_att=36.769, acc=0.648, loss=46.336, backward_time=0.053, grad_norm=179.395, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.248e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:57:16,574 (trainer:732) INFO: 24epoch:train:1791-2148batch: iter_time=0.003, forward_time=0.101, loss_ctc=65.399, loss_att=34.688, acc=0.656, loss=43.901, backward_time=0.053, grad_norm=173.297, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.243e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 05:58:48,218 (trainer:732) INFO: 24epoch:train:2149-2506batch: iter_time=0.004, forward_time=0.100, loss_ctc=60.986, loss_att=32.377, acc=0.659, loss=40.960, backward_time=0.054, grad_norm=165.706, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.239e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:00:19,242 (trainer:732) INFO: 24epoch:train:2507-2864batch: iter_time=0.003, forward_time=0.099, loss_ctc=62.700, loss_att=33.437, acc=0.660, loss=42.216, backward_time=0.055, grad_norm=169.008, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.234e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:01:52,759 (trainer:732) INFO: 24epoch:train:2865-3222batch: iter_time=0.003, forward_time=0.102, loss_ctc=65.807, loss_att=35.158, acc=0.656, loss=44.352, backward_time=0.053, grad_norm=179.034, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.230e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:03:25,590 (trainer:732) INFO: 24epoch:train:3223-3580batch: iter_time=0.007, forward_time=0.100, loss_ctc=63.025, loss_att=33.447, acc=0.660, loss=42.321, backward_time=0.053, grad_norm=173.238, clip=100.000, loss_scale=1.017e+03, optim_step_time=0.033, optim0_lr0=4.225e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:04:26,432 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:04:59,078 (trainer:732) INFO: 24epoch:train:3581-3938batch: iter_time=0.005, forward_time=0.101, loss_ctc=65.063, loss_att=34.580, acc=0.661, loss=43.725, backward_time=0.055, grad_norm=176.851, clip=100.000, loss_scale=841.860, optim_step_time=0.033, optim0_lr0=4.221e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:06:30,115 (trainer:732) INFO: 24epoch:train:3939-4296batch: iter_time=0.002, forward_time=0.099, loss_ctc=66.014, loss_att=35.080, acc=0.660, loss=44.360, backward_time=0.054, grad_norm=180.720, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.216e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:08:01,755 (trainer:732) INFO: 24epoch:train:4297-4654batch: iter_time=0.003, forward_time=0.100, loss_ctc=67.331, loss_att=35.920, acc=0.654, loss=45.343, backward_time=0.053, grad_norm=183.912, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.212e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:09:35,310 (trainer:732) INFO: 24epoch:train:4655-5012batch: iter_time=0.006, forward_time=0.101, loss_ctc=65.661, loss_att=35.007, acc=0.659, loss=44.203, backward_time=0.054, grad_norm=176.202, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.207e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:11:07,426 (trainer:732) INFO: 24epoch:train:5013-5370batch: iter_time=0.004, forward_time=0.100, loss_ctc=63.164, loss_att=33.581, acc=0.663, loss=42.456, backward_time=0.055, grad_norm=177.484, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.203e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:12:40,987 (trainer:732) INFO: 24epoch:train:5371-5728batch: iter_time=0.007, forward_time=0.101, loss_ctc=62.591, loss_att=33.395, acc=0.660, loss=42.154, backward_time=0.054, grad_norm=170.425, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.199e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:13:15,676 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:13:16,224 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:14:13,109 (trainer:732) INFO: 24epoch:train:5729-6086batch: iter_time=0.004, forward_time=0.100, loss_ctc=64.819, loss_att=34.485, acc=0.662, loss=43.586, backward_time=0.053, grad_norm=178.063, clip=100.000, loss_scale=586.577, optim_step_time=0.034, optim0_lr0=4.194e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:14:23,373 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:15:45,927 (trainer:732) INFO: 24epoch:train:6087-6444batch: iter_time=0.006, forward_time=0.101, loss_ctc=63.034, loss_att=33.419, acc=0.670, loss=42.303, backward_time=0.053, grad_norm=175.107, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.190e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:17:19,292 (trainer:732) INFO: 24epoch:train:6445-6802batch: iter_time=0.007, forward_time=0.100, loss_ctc=64.204, loss_att=34.200, acc=0.663, loss=43.201, backward_time=0.054, grad_norm=178.130, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.185e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:17:56,277 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:18:52,769 (trainer:732) INFO: 24epoch:train:6803-7160batch: iter_time=0.008, forward_time=0.100, loss_ctc=62.327, loss_att=33.064, acc=0.666, loss=41.843, backward_time=0.055, grad_norm=172.987, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.181e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:20:01,452 (trainer:338) INFO: 24epoch results: [train] iter_time=0.004, forward_time=0.101, loss_ctc=64.952, loss_att=34.592, acc=0.658, loss=43.700, backward_time=0.054, grad_norm=175.812, clip=100.000, loss_scale=562.928, optim_step_time=0.033, optim0_lr0=4.223e-05, train_time=0.258, time=30 minutes and 50.3 seconds, total_count=171864, gpu_max_cached_mem_GB=28.451, [valid] loss_ctc=40.161, cer_ctc=0.247, loss_att=20.306, acc=0.801, cer=0.136, wer=0.936, loss=26.262, time=14.71 seconds, total_count=1272, gpu_max_cached_mem_GB=28.451, [att_plot] time=53.31 seconds, total_count=0, gpu_max_cached_mem_GB=28.451 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:20:05,249 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:20:05,267 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/14epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:20:05,267 (trainer:272) INFO: 25/100epoch started. Estimated time to finish: 1 day, 16 hours and 55 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:21:36,634 (trainer:732) INFO: 25epoch:train:1-358batch: iter_time=0.002, forward_time=0.101, loss_ctc=63.667, loss_att=33.743, acc=0.670, loss=42.720, backward_time=0.053, grad_norm=180.555, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.177e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:22:27,493 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:23:07,910 (trainer:732) INFO: 25epoch:train:359-716batch: iter_time=0.002, forward_time=0.100, loss_ctc=63.861, loss_att=33.778, acc=0.670, loss=42.803, backward_time=0.053, grad_norm=176.022, clip=100.000, loss_scale=530.592, optim_step_time=0.033, optim0_lr0=4.172e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:23:52,441 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:24:38,206 (trainer:732) INFO: 25epoch:train:717-1074batch: iter_time=0.001, forward_time=0.099, loss_ctc=61.863, loss_att=32.619, acc=0.673, loss=41.392, backward_time=0.053, grad_norm=176.065, clip=100.000, loss_scale=765.849, optim_step_time=0.033, optim0_lr0=4.168e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:26:10,380 (trainer:732) INFO: 25epoch:train:1075-1432batch: iter_time=0.001, forward_time=0.102, loss_ctc=65.927, loss_att=35.071, acc=0.667, loss=44.328, backward_time=0.053, grad_norm=192.162, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.164e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:27:40,977 (trainer:732) INFO: 25epoch:train:1433-1790batch: iter_time=0.003, forward_time=0.099, loss_ctc=61.995, loss_att=32.748, acc=0.673, loss=41.522, backward_time=0.053, grad_norm=173.561, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.159e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:29:11,312 (trainer:732) INFO: 25epoch:train:1791-2148batch: iter_time=0.002, forward_time=0.099, loss_ctc=62.779, loss_att=33.211, acc=0.672, loss=42.081, backward_time=0.053, grad_norm=181.179, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.155e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:30:28,269 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:30:42,258 (trainer:732) INFO: 25epoch:train:2149-2506batch: iter_time=0.003, forward_time=0.099, loss_ctc=63.442, loss_att=33.638, acc=0.669, loss=42.580, backward_time=0.053, grad_norm=169.263, clip=100.000, loss_scale=472.560, optim_step_time=0.033, optim0_lr0=4.151e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:32:16,535 (trainer:732) INFO: 25epoch:train:2507-2864batch: iter_time=0.005, forward_time=0.103, loss_ctc=63.289, loss_att=33.552, acc=0.670, loss=42.473, backward_time=0.053, grad_norm=178.672, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=4.147e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:32:57,637 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:33:49,634 (trainer:732) INFO: 25epoch:train:2865-3222batch: iter_time=0.003, forward_time=0.104, loss_ctc=62.442, loss_att=33.049, acc=0.674, loss=41.867, backward_time=0.054, grad_norm=177.426, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=4.142e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:35:22,225 (trainer:732) INFO: 25epoch:train:3223-3580batch: iter_time=0.002, forward_time=0.101, loss_ctc=65.573, loss_att=34.827, acc=0.669, loss=44.051, backward_time=0.055, grad_norm=179.130, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=4.138e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:36:54,695 (trainer:732) INFO: 25epoch:train:3581-3938batch: iter_time=0.003, forward_time=0.101, loss_ctc=61.495, loss_att=32.436, acc=0.675, loss=41.154, backward_time=0.054, grad_norm=178.168, clip=100.000, loss_scale=256.000, optim_step_time=0.034, optim0_lr0=4.134e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:38:25,807 (trainer:732) INFO: 25epoch:train:3939-4296batch: iter_time=0.003, forward_time=0.099, loss_ctc=62.331, loss_att=32.859, acc=0.676, loss=41.700, backward_time=0.054, grad_norm=174.676, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=4.130e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:39:59,489 (trainer:732) INFO: 25epoch:train:4297-4654batch: iter_time=0.006, forward_time=0.101, loss_ctc=62.813, loss_att=33.338, acc=0.676, loss=42.181, backward_time=0.053, grad_norm=173.110, clip=100.000, loss_scale=401.162, optim_step_time=0.033, optim0_lr0=4.126e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:40:01,239 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:41:31,802 (trainer:732) INFO: 25epoch:train:4655-5012batch: iter_time=0.007, forward_time=0.099, loss_ctc=59.182, loss_att=31.230, acc=0.679, loss=39.616, backward_time=0.053, grad_norm=173.054, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.121e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:43:04,785 (trainer:732) INFO: 25epoch:train:5013-5370batch: iter_time=0.004, forward_time=0.101, loss_ctc=62.822, loss_att=33.342, acc=0.673, loss=42.186, backward_time=0.054, grad_norm=175.341, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.117e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:44:36,218 (trainer:732) INFO: 25epoch:train:5371-5728batch: iter_time=0.005, forward_time=0.099, loss_ctc=60.614, loss_att=31.856, acc=0.682, loss=40.483, backward_time=0.054, grad_norm=171.922, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.113e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:46:08,668 (trainer:732) INFO: 25epoch:train:5729-6086batch: iter_time=0.007, forward_time=0.099, loss_ctc=58.765, loss_att=30.911, acc=0.683, loss=39.267, backward_time=0.053, grad_norm=167.826, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.109e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:47:41,255 (trainer:732) INFO: 25epoch:train:6087-6444batch: iter_time=0.005, forward_time=0.100, loss_ctc=62.800, loss_att=33.192, acc=0.679, loss=42.075, backward_time=0.053, grad_norm=178.747, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.105e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:49:06,261 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:49:13,831 (trainer:732) INFO: 25epoch:train:6445-6802batch: iter_time=0.007, forward_time=0.099, loss_ctc=60.083, loss_att=31.587, acc=0.682, loss=40.136, backward_time=0.053, grad_norm=182.006, clip=100.000, loss_scale=973.804, optim_step_time=0.033, optim0_lr0=4.101e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:50:47,429 (trainer:732) INFO: 25epoch:train:6803-7160batch: iter_time=0.007, forward_time=0.101, loss_ctc=62.285, loss_att=32.841, acc=0.682, loss=41.675, backward_time=0.053, grad_norm=176.563, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.097e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:51:56,151 (trainer:338) INFO: 25epoch results: [train] iter_time=0.004, forward_time=0.100, loss_ctc=62.363, loss_att=32.969, acc=0.675, loss=41.787, backward_time=0.053, grad_norm=176.764, clip=100.000, loss_scale=477.094, optim_step_time=0.033, optim0_lr0=4.136e-05, train_time=0.257, time=30 minutes and 42.8 seconds, total_count=179025, gpu_max_cached_mem_GB=28.451, [valid] loss_ctc=37.822, cer_ctc=0.232, loss_att=19.060, acc=0.815, cer=0.127, wer=0.917, loss=24.688, time=14.66 seconds, total_count=1325, gpu_max_cached_mem_GB=28.451, [att_plot] time=53.42 seconds, total_count=0, gpu_max_cached_mem_GB=28.451 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:51:59,919 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:51:59,937 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/15epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:51:59,938 (trainer:272) INFO: 26/100epoch started. Estimated time to finish: 1 day, 16 hours and 22 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:53:31,849 (trainer:732) INFO: 26epoch:train:1-358batch: iter_time=0.004, forward_time=0.099, loss_ctc=59.997, loss_att=31.546, acc=0.684, loss=40.082, backward_time=0.054, grad_norm=174.685, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.092e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:55:02,245 (trainer:732) INFO: 26epoch:train:359-716batch: iter_time=0.001, forward_time=0.099, loss_ctc=61.481, loss_att=32.203, acc=0.685, loss=40.986, backward_time=0.054, grad_norm=177.013, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.088e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:56:33,639 (trainer:732) INFO: 26epoch:train:717-1074batch: iter_time=0.002, forward_time=0.102, loss_ctc=58.068, loss_att=30.486, acc=0.686, loss=38.761, backward_time=0.053, grad_norm=169.308, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.084e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:57:53,763 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:58:05,161 (trainer:732) INFO: 26epoch:train:1075-1432batch: iter_time=0.001, forward_time=0.102, loss_ctc=60.102, loss_att=31.548, acc=0.687, loss=40.114, backward_time=0.054, grad_norm=173.492, clip=100.000, loss_scale=479.731, optim_step_time=0.033, optim0_lr0=4.080e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 06:59:37,161 (trainer:732) INFO: 26epoch:train:1433-1790batch: iter_time=0.004, forward_time=0.100, loss_ctc=61.833, loss_att=32.331, acc=0.689, loss=41.182, backward_time=0.053, grad_norm=175.228, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=4.076e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:01:08,102 (trainer:732) INFO: 26epoch:train:1791-2148batch: iter_time=0.002, forward_time=0.099, loss_ctc=59.529, loss_att=31.249, acc=0.689, loss=39.733, backward_time=0.053, grad_norm=173.075, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=4.072e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:01:49,081 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:02:40,892 (trainer:732) INFO: 26epoch:train:2149-2506batch: iter_time=0.002, forward_time=0.102, loss_ctc=62.148, loss_att=32.986, acc=0.683, loss=41.735, backward_time=0.055, grad_norm=183.885, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=4.068e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:04:12,927 (trainer:732) INFO: 26epoch:train:2507-2864batch: iter_time=0.004, forward_time=0.101, loss_ctc=60.978, loss_att=32.032, acc=0.687, loss=40.716, backward_time=0.054, grad_norm=175.853, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=4.064e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:05:45,372 (trainer:732) INFO: 26epoch:train:2865-3222batch: iter_time=0.004, forward_time=0.100, loss_ctc=58.379, loss_att=30.623, acc=0.691, loss=38.950, backward_time=0.056, grad_norm=177.231, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=4.060e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:07:17,287 (trainer:732) INFO: 26epoch:train:3223-3580batch: iter_time=0.002, forward_time=0.101, loss_ctc=61.716, loss_att=32.454, acc=0.689, loss=41.233, backward_time=0.053, grad_norm=178.866, clip=100.000, loss_scale=394.011, optim_step_time=0.033, optim0_lr0=4.056e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:08:48,643 (trainer:732) INFO: 26epoch:train:3581-3938batch: iter_time=0.003, forward_time=0.100, loss_ctc=60.736, loss_att=31.797, acc=0.691, loss=40.479, backward_time=0.053, grad_norm=177.477, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.052e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:10:20,374 (trainer:732) INFO: 26epoch:train:3939-4296batch: iter_time=0.004, forward_time=0.099, loss_ctc=59.835, loss_att=31.487, acc=0.687, loss=39.992, backward_time=0.054, grad_norm=180.459, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.048e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:11:51,472 (trainer:732) INFO: 26epoch:train:4297-4654batch: iter_time=0.003, forward_time=0.099, loss_ctc=60.767, loss_att=32.004, acc=0.686, loss=40.633, backward_time=0.053, grad_norm=182.751, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=4.044e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:12:06,582 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:13:23,716 (trainer:732) INFO: 26epoch:train:4655-5012batch: iter_time=0.005, forward_time=0.099, loss_ctc=59.695, loss_att=31.354, acc=0.692, loss=39.856, backward_time=0.054, grad_norm=173.991, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.040e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:14:41,557 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:14:57,170 (trainer:732) INFO: 26epoch:train:5013-5370batch: iter_time=0.005, forward_time=0.102, loss_ctc=61.109, loss_att=32.245, acc=0.688, loss=40.904, backward_time=0.054, grad_norm=181.430, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.036e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:15:05,422 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:16:29,394 (trainer:732) INFO: 26epoch:train:5371-5728batch: iter_time=0.004, forward_time=0.100, loss_ctc=60.292, loss_att=31.586, acc=0.693, loss=40.198, backward_time=0.055, grad_norm=177.672, clip=100.000, loss_scale=533.513, optim_step_time=0.033, optim0_lr0=4.032e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:18:02,721 (trainer:732) INFO: 26epoch:train:5729-6086batch: iter_time=0.007, forward_time=0.101, loss_ctc=58.724, loss_att=30.788, acc=0.693, loss=39.169, backward_time=0.053, grad_norm=170.925, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.029e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:19:36,309 (trainer:732) INFO: 26epoch:train:6087-6444batch: iter_time=0.008, forward_time=0.100, loss_ctc=57.251, loss_att=29.926, acc=0.696, loss=38.123, backward_time=0.053, grad_norm=175.336, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.025e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:21:10,143 (trainer:732) INFO: 26epoch:train:6445-6802batch: iter_time=0.006, forward_time=0.102, loss_ctc=60.045, loss_att=31.601, acc=0.692, loss=40.134, backward_time=0.053, grad_norm=180.729, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.021e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:22:43,606 (trainer:732) INFO: 26epoch:train:6803-7160batch: iter_time=0.009, forward_time=0.100, loss_ctc=59.403, loss_att=31.152, acc=0.696, loss=39.627, backward_time=0.054, grad_norm=177.343, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.017e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:23:52,335 (trainer:338) INFO: 26epoch results: [train] iter_time=0.004, forward_time=0.100, loss_ctc=60.081, loss_att=31.557, acc=0.689, loss=40.114, backward_time=0.054, grad_norm=176.830, clip=100.000, loss_scale=441.554, optim_step_time=0.033, optim0_lr0=4.054e-05, train_time=0.257, time=30 minutes and 44.32 seconds, total_count=186186, gpu_max_cached_mem_GB=28.451, [valid] loss_ctc=35.920, cer_ctc=0.217, loss_att=18.032, acc=0.825, cer=0.118, wer=0.907, loss=23.399, time=14.43 seconds, total_count=1378, gpu_max_cached_mem_GB=28.451, [att_plot] time=53.64 seconds, total_count=0, gpu_max_cached_mem_GB=28.451 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:23:56,204 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:23:56,222 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/16epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:23:56,222 (trainer:272) INFO: 27/100epoch started. Estimated time to finish: 1 day, 15 hours and 48 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:25:27,854 (trainer:732) INFO: 27epoch:train:1-358batch: iter_time=0.002, forward_time=0.101, loss_ctc=60.468, loss_att=31.558, acc=0.697, loss=40.231, backward_time=0.054, grad_norm=175.563, clip=100.000, loss_scale=677.899, optim_step_time=0.033, optim0_lr0=4.013e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:25:59,066 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:26:58,726 (trainer:732) INFO: 27epoch:train:359-716batch: iter_time=7.629e-04, forward_time=0.101, loss_ctc=58.343, loss_att=30.404, acc=0.699, loss=38.786, backward_time=0.053, grad_norm=174.261, clip=100.000, loss_scale=685.535, optim_step_time=0.033, optim0_lr0=4.009e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:28:30,965 (trainer:732) INFO: 27epoch:train:717-1074batch: iter_time=0.001, forward_time=0.102, loss_ctc=58.864, loss_att=30.814, acc=0.696, loss=39.229, backward_time=0.053, grad_norm=176.034, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.005e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:30:02,287 (trainer:732) INFO: 27epoch:train:1075-1432batch: iter_time=0.001, forward_time=0.101, loss_ctc=59.406, loss_att=30.995, acc=0.698, loss=39.518, backward_time=0.053, grad_norm=182.761, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=4.001e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:31:34,048 (trainer:732) INFO: 27epoch:train:1433-1790batch: iter_time=0.003, forward_time=0.101, loss_ctc=57.299, loss_att=29.852, acc=0.701, loss=38.086, backward_time=0.053, grad_norm=173.649, clip=100.000, loss_scale=512.000, optim_step_time=0.034, optim0_lr0=3.998e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:33:05,868 (trainer:732) INFO: 27epoch:train:1791-2148batch: iter_time=0.004, forward_time=0.099, loss_ctc=57.092, loss_att=29.797, acc=0.701, loss=37.986, backward_time=0.053, grad_norm=178.375, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.994e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:33:51,786 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:34:39,082 (trainer:732) INFO: 27epoch:train:2149-2506batch: iter_time=0.005, forward_time=0.101, loss_ctc=57.646, loss_att=30.078, acc=0.701, loss=38.348, backward_time=0.053, grad_norm=179.359, clip=100.000, loss_scale=549.184, optim_step_time=0.033, optim0_lr0=3.990e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:36:10,321 (trainer:732) INFO: 27epoch:train:2507-2864batch: iter_time=0.003, forward_time=0.099, loss_ctc=56.882, loss_att=29.760, acc=0.702, loss=37.897, backward_time=0.054, grad_norm=179.720, clip=100.000, loss_scale=1.024e+03, optim_step_time=0.033, optim0_lr0=3.986e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:37:42,089 (trainer:732) INFO: 27epoch:train:2865-3222batch: iter_time=0.004, forward_time=0.099, loss_ctc=58.760, loss_att=30.682, acc=0.701, loss=39.105, backward_time=0.053, grad_norm=182.575, clip=100.000, loss_scale=1.024e+03, optim_step_time=0.033, optim0_lr0=3.983e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:37:53,793 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:39:13,595 (trainer:732) INFO: 27epoch:train:3223-3580batch: iter_time=0.004, forward_time=0.099, loss_ctc=58.522, loss_att=30.607, acc=0.702, loss=38.982, backward_time=0.053, grad_norm=177.110, clip=100.000, loss_scale=575.104, optim_step_time=0.033, optim0_lr0=3.979e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:40:45,913 (trainer:732) INFO: 27epoch:train:3581-3938batch: iter_time=0.006, forward_time=0.099, loss_ctc=56.719, loss_att=29.643, acc=0.703, loss=37.766, backward_time=0.053, grad_norm=177.267, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.975e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:42:18,364 (trainer:732) INFO: 27epoch:train:3939-4296batch: iter_time=0.003, forward_time=0.100, loss_ctc=59.846, loss_att=31.248, acc=0.701, loss=39.827, backward_time=0.054, grad_norm=178.105, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.971e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:43:45,078 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:43:48,860 (trainer:732) INFO: 27epoch:train:4297-4654batch: iter_time=0.003, forward_time=0.098, loss_ctc=57.536, loss_att=30.065, acc=0.701, loss=38.306, backward_time=0.055, grad_norm=173.823, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.968e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:45:21,786 (trainer:732) INFO: 27epoch:train:4655-5012batch: iter_time=0.005, forward_time=0.100, loss_ctc=58.155, loss_att=30.554, acc=0.701, loss=38.834, backward_time=0.053, grad_norm=178.054, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.964e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:46:28,351 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:46:52,286 (trainer:732) INFO: 27epoch:train:5013-5370batch: iter_time=0.004, forward_time=0.098, loss_ctc=58.115, loss_att=30.340, acc=0.702, loss=38.672, backward_time=0.053, grad_norm=174.987, clip=100.000, loss_scale=523.473, optim_step_time=0.033, optim0_lr0=3.960e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:48:24,544 (trainer:732) INFO: 27epoch:train:5371-5728batch: iter_time=0.007, forward_time=0.098, loss_ctc=54.693, loss_att=28.548, acc=0.709, loss=36.391, backward_time=0.054, grad_norm=175.025, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.956e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:49:56,242 (trainer:732) INFO: 27epoch:train:5729-6086batch: iter_time=0.004, forward_time=0.099, loss_ctc=58.341, loss_att=30.603, acc=0.703, loss=38.924, backward_time=0.053, grad_norm=184.301, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.953e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:51:28,829 (trainer:732) INFO: 27epoch:train:6087-6444batch: iter_time=0.007, forward_time=0.099, loss_ctc=57.117, loss_att=29.803, acc=0.707, loss=37.997, backward_time=0.053, grad_norm=179.392, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.949e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:52:47,065 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:53:01,444 (trainer:732) INFO: 27epoch:train:6445-6802batch: iter_time=0.006, forward_time=0.100, loss_ctc=58.042, loss_att=30.397, acc=0.706, loss=38.690, backward_time=0.053, grad_norm=179.972, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.945e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:54:35,308 (trainer:732) INFO: 27epoch:train:6803-7160batch: iter_time=0.006, forward_time=0.101, loss_ctc=60.462, loss_att=31.527, acc=0.705, loss=40.207, backward_time=0.053, grad_norm=180.591, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.942e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:55:44,012 (trainer:338) INFO: 27epoch results: [train] iter_time=0.004, forward_time=0.100, loss_ctc=58.098, loss_att=30.354, acc=0.702, loss=38.677, backward_time=0.053, grad_norm=178.043, clip=100.000, loss_scale=585.746, optim_step_time=0.033, optim0_lr0=3.977e-05, train_time=0.257, time=30 minutes and 39.74 seconds, total_count=193347, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=34.034, cer_ctc=0.203, loss_att=17.108, acc=0.834, cer=0.111, wer=0.898, loss=22.186, time=14.52 seconds, total_count=1431, gpu_max_cached_mem_GB=28.453, [att_plot] time=53.53 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:55:47,823 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:55:47,843 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/17epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:55:47,843 (trainer:272) INFO: 28/100epoch started. Estimated time to finish: 1 day, 15 hours and 15 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:57:20,847 (trainer:732) INFO: 28epoch:train:1-358batch: iter_time=0.003, forward_time=0.102, loss_ctc=56.255, loss_att=29.227, acc=0.712, loss=37.335, backward_time=0.054, grad_norm=175.677, clip=100.000, loss_scale=859.531, optim_step_time=0.033, optim0_lr0=3.938e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 07:58:52,492 (trainer:732) INFO: 28epoch:train:359-716batch: iter_time=0.002, forward_time=0.101, loss_ctc=57.586, loss_att=30.063, acc=0.708, loss=38.320, backward_time=0.054, grad_norm=177.902, clip=100.000, loss_scale=1.024e+03, optim_step_time=0.033, optim0_lr0=3.934e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:00:04,734 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:00:23,494 (trainer:732) INFO: 28epoch:train:717-1074batch: iter_time=0.002, forward_time=0.100, loss_ctc=56.447, loss_att=29.443, acc=0.709, loss=37.544, backward_time=0.055, grad_norm=175.496, clip=100.000, loss_scale=920.739, optim_step_time=0.033, optim0_lr0=3.931e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:01:56,059 (trainer:732) INFO: 28epoch:train:1075-1432batch: iter_time=0.002, forward_time=0.102, loss_ctc=57.130, loss_att=29.892, acc=0.704, loss=38.064, backward_time=0.054, grad_norm=177.448, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.927e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:03:26,523 (trainer:732) INFO: 28epoch:train:1433-1790batch: iter_time=6.272e-04, forward_time=0.100, loss_ctc=58.765, loss_att=30.483, acc=0.709, loss=38.967, backward_time=0.053, grad_norm=179.668, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.924e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:04:58,459 (trainer:732) INFO: 28epoch:train:1791-2148batch: iter_time=0.005, forward_time=0.100, loss_ctc=56.004, loss_att=29.013, acc=0.712, loss=37.110, backward_time=0.053, grad_norm=172.659, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.920e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:06:30,795 (trainer:732) INFO: 28epoch:train:2149-2506batch: iter_time=0.004, forward_time=0.101, loss_ctc=55.212, loss_att=28.623, acc=0.715, loss=36.600, backward_time=0.054, grad_norm=176.756, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.916e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:08:03,122 (trainer:732) INFO: 28epoch:train:2507-2864batch: iter_time=0.003, forward_time=0.101, loss_ctc=57.117, loss_att=29.661, acc=0.711, loss=37.898, backward_time=0.054, grad_norm=178.630, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.913e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:08:39,660 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:09:34,260 (trainer:732) INFO: 28epoch:train:2865-3222batch: iter_time=0.004, forward_time=0.099, loss_ctc=55.127, loss_att=28.584, acc=0.715, loss=36.547, backward_time=0.054, grad_norm=179.329, clip=100.000, loss_scale=517.737, optim_step_time=0.033, optim0_lr0=3.909e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:11:07,436 (trainer:732) INFO: 28epoch:train:3223-3580batch: iter_time=0.005, forward_time=0.101, loss_ctc=56.333, loss_att=29.279, acc=0.711, loss=37.395, backward_time=0.053, grad_norm=181.035, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.906e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:12:39,729 (trainer:732) INFO: 28epoch:train:3581-3938batch: iter_time=0.005, forward_time=0.100, loss_ctc=55.860, loss_att=29.161, acc=0.713, loss=37.171, backward_time=0.053, grad_norm=183.286, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.902e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:14:13,314 (trainer:732) INFO: 28epoch:train:3939-4296batch: iter_time=0.006, forward_time=0.101, loss_ctc=57.357, loss_att=29.828, acc=0.712, loss=38.087, backward_time=0.053, grad_norm=180.531, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.899e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:15:44,917 (trainer:732) INFO: 28epoch:train:4297-4654batch: iter_time=0.004, forward_time=0.100, loss_ctc=56.852, loss_att=29.711, acc=0.709, loss=37.854, backward_time=0.054, grad_norm=177.890, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.895e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:17:17,738 (trainer:732) INFO: 28epoch:train:4655-5012batch: iter_time=0.005, forward_time=0.101, loss_ctc=56.907, loss_att=29.641, acc=0.714, loss=37.821, backward_time=0.053, grad_norm=179.636, clip=100.000, loss_scale=519.151, optim_step_time=0.033, optim0_lr0=3.892e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:17:55,119 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:18:49,208 (trainer:732) INFO: 28epoch:train:5013-5370batch: iter_time=0.003, forward_time=0.100, loss_ctc=56.963, loss_att=29.565, acc=0.715, loss=37.784, backward_time=0.054, grad_norm=183.220, clip=100.000, loss_scale=718.521, optim_step_time=0.033, optim0_lr0=3.888e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:19:01,054 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:19:03,693 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:20:22,366 (trainer:732) INFO: 28epoch:train:5371-5728batch: iter_time=0.006, forward_time=0.102, loss_ctc=55.467, loss_att=28.843, acc=0.713, loss=36.830, backward_time=0.054, grad_norm=177.245, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.885e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:21:54,515 (trainer:732) INFO: 28epoch:train:5729-6086batch: iter_time=0.006, forward_time=0.100, loss_ctc=56.892, loss_att=29.556, acc=0.716, loss=37.756, backward_time=0.053, grad_norm=181.504, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.881e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:23:22,630 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:23:25,949 (trainer:732) INFO: 28epoch:train:6087-6444batch: iter_time=0.004, forward_time=0.099, loss_ctc=54.279, loss_att=28.196, acc=0.718, loss=36.021, backward_time=0.053, grad_norm=174.151, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.878e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:24:58,558 (trainer:732) INFO: 28epoch:train:6445-6802batch: iter_time=0.010, forward_time=0.098, loss_ctc=52.825, loss_att=27.358, acc=0.721, loss=34.998, backward_time=0.053, grad_norm=172.664, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.874e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:26:32,114 (trainer:732) INFO: 28epoch:train:6803-7160batch: iter_time=0.005, forward_time=0.101, loss_ctc=56.797, loss_att=29.664, acc=0.715, loss=37.804, backward_time=0.055, grad_norm=183.879, clip=100.000, loss_scale=516.291, optim_step_time=0.033, optim0_lr0=3.871e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:27:41,011 (trainer:338) INFO: 28epoch results: [train] iter_time=0.004, forward_time=0.100, loss_ctc=56.297, loss_att=29.284, acc=0.713, loss=37.388, backward_time=0.054, grad_norm=178.430, clip=100.000, loss_scale=586.604, optim_step_time=0.033, optim0_lr0=3.904e-05, train_time=0.257, time=30 minutes and 44.97 seconds, total_count=200508, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=32.656, cer_ctc=0.193, loss_att=16.424, acc=0.841, cer=0.105, wer=0.886, loss=21.293, time=14.55 seconds, total_count=1484, gpu_max_cached_mem_GB=28.453, [att_plot] time=53.65 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:27:44,932 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:27:44,951 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/18epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:27:44,951 (trainer:272) INFO: 29/100epoch started. Estimated time to finish: 1 day, 14 hours and 42 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:28:31,615 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:29:17,329 (trainer:732) INFO: 29epoch:train:1-358batch: iter_time=0.003, forward_time=0.101, loss_ctc=54.311, loss_att=28.144, acc=0.719, loss=35.994, backward_time=0.053, grad_norm=174.842, clip=100.000, loss_scale=768.717, optim_step_time=0.034, optim0_lr0=3.867e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:30:48,595 (trainer:732) INFO: 29epoch:train:359-716batch: iter_time=6.153e-04, forward_time=0.101, loss_ctc=56.436, loss_att=29.256, acc=0.718, loss=37.410, backward_time=0.053, grad_norm=177.308, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.864e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:32:20,656 (trainer:732) INFO: 29epoch:train:717-1074batch: iter_time=0.002, forward_time=0.101, loss_ctc=56.751, loss_att=29.509, acc=0.718, loss=37.682, backward_time=0.054, grad_norm=182.265, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.860e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:33:50,660 (trainer:732) INFO: 29epoch:train:1075-1432batch: iter_time=8.016e-04, forward_time=0.099, loss_ctc=54.165, loss_att=28.066, acc=0.721, loss=35.896, backward_time=0.053, grad_norm=177.369, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.857e-05, train_time=0.251 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:35:21,179 (trainer:732) INFO: 29epoch:train:1433-1790batch: iter_time=0.002, forward_time=0.099, loss_ctc=54.244, loss_att=28.114, acc=0.720, loss=35.953, backward_time=0.053, grad_norm=176.749, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.853e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:36:53,939 (trainer:732) INFO: 29epoch:train:1791-2148batch: iter_time=0.003, forward_time=0.102, loss_ctc=55.431, loss_att=28.710, acc=0.719, loss=36.727, backward_time=0.054, grad_norm=176.714, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.850e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:37:09,952 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:38:25,667 (trainer:732) INFO: 29epoch:train:2149-2506batch: iter_time=0.003, forward_time=0.100, loss_ctc=56.201, loss_att=29.108, acc=0.718, loss=37.236, backward_time=0.053, grad_norm=178.838, clip=100.000, loss_scale=555.025, optim_step_time=0.033, optim0_lr0=3.847e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:38:53,458 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:39:58,902 (trainer:732) INFO: 29epoch:train:2507-2864batch: iter_time=0.003, forward_time=0.102, loss_ctc=55.964, loss_att=29.086, acc=0.721, loss=37.149, backward_time=0.053, grad_norm=185.114, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.843e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:41:31,477 (trainer:732) INFO: 29epoch:train:2865-3222batch: iter_time=0.005, forward_time=0.100, loss_ctc=54.685, loss_att=28.232, acc=0.723, loss=36.168, backward_time=0.054, grad_norm=186.323, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.840e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:43:02,998 (trainer:732) INFO: 29epoch:train:3223-3580batch: iter_time=0.004, forward_time=0.099, loss_ctc=55.640, loss_att=28.820, acc=0.721, loss=36.866, backward_time=0.053, grad_norm=185.735, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.836e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:44:36,257 (trainer:732) INFO: 29epoch:train:3581-3938batch: iter_time=0.007, forward_time=0.100, loss_ctc=54.339, loss_att=28.207, acc=0.724, loss=36.046, backward_time=0.053, grad_norm=179.102, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.833e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:46:07,430 (trainer:732) INFO: 29epoch:train:3939-4296batch: iter_time=0.005, forward_time=0.098, loss_ctc=54.177, loss_att=28.000, acc=0.722, loss=35.853, backward_time=0.053, grad_norm=175.003, clip=100.000, loss_scale=633.564, optim_step_time=0.033, optim0_lr0=3.830e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:46:17,915 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:47:38,755 (trainer:732) INFO: 29epoch:train:4297-4654batch: iter_time=0.004, forward_time=0.098, loss_ctc=54.048, loss_att=27.955, acc=0.725, loss=35.783, backward_time=0.055, grad_norm=176.343, clip=100.000, loss_scale=570.801, optim_step_time=0.033, optim0_lr0=3.826e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:49:11,846 (trainer:732) INFO: 29epoch:train:4655-5012batch: iter_time=0.006, forward_time=0.101, loss_ctc=54.278, loss_att=28.062, acc=0.723, loss=35.927, backward_time=0.054, grad_norm=179.787, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.823e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:50:43,585 (trainer:732) INFO: 29epoch:train:5013-5370batch: iter_time=0.005, forward_time=0.099, loss_ctc=54.196, loss_att=28.061, acc=0.723, loss=35.901, backward_time=0.054, grad_norm=178.919, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.820e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:51:42,151 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:52:16,712 (trainer:732) INFO: 29epoch:train:5371-5728batch: iter_time=0.006, forward_time=0.101, loss_ctc=53.337, loss_att=27.694, acc=0.722, loss=35.386, backward_time=0.053, grad_norm=174.338, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.816e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:53:15,614 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:53:49,712 (trainer:732) INFO: 29epoch:train:5729-6086batch: iter_time=0.008, forward_time=0.099, loss_ctc=53.247, loss_att=27.486, acc=0.725, loss=35.214, backward_time=0.055, grad_norm=170.546, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.813e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:55:09,505 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:55:23,793 (trainer:732) INFO: 29epoch:train:6087-6444batch: iter_time=0.008, forward_time=0.101, loss_ctc=54.460, loss_att=28.239, acc=0.727, loss=36.105, backward_time=0.055, grad_norm=177.568, clip=100.000, loss_scale=583.709, optim_step_time=0.033, optim0_lr0=3.810e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:56:55,466 (trainer:732) INFO: 29epoch:train:6445-6802batch: iter_time=0.006, forward_time=0.098, loss_ctc=54.028, loss_att=28.019, acc=0.725, loss=35.821, backward_time=0.055, grad_norm=181.753, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.807e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:58:29,117 (trainer:732) INFO: 29epoch:train:6803-7160batch: iter_time=0.007, forward_time=0.100, loss_ctc=54.782, loss_att=28.433, acc=0.725, loss=36.338, backward_time=0.054, grad_norm=182.075, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.803e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:59:38,359 (trainer:338) INFO: 29epoch results: [train] iter_time=0.004, forward_time=0.100, loss_ctc=54.723, loss_att=28.353, acc=0.722, loss=36.264, backward_time=0.054, grad_norm=178.832, clip=100.000, loss_scale=539.542, optim_step_time=0.033, optim0_lr0=3.835e-05, train_time=0.257, time=30 minutes and 44.79 seconds, total_count=207669, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=31.366, cer_ctc=0.183, loss_att=15.727, acc=0.849, cer=0.101, wer=0.874, loss=20.419, time=14.53 seconds, total_count=1537, gpu_max_cached_mem_GB=28.453, [att_plot] time=54.09 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:59:41,919 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:59:41,936 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/19epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 08:59:41,937 (trainer:272) INFO: 30/100epoch started. Estimated time to finish: 1 day, 14 hours and 8 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:01:15,325 (trainer:732) INFO: 30epoch:train:1-358batch: iter_time=0.003, forward_time=0.105, loss_ctc=52.101, loss_att=26.907, acc=0.729, loss=34.465, backward_time=0.054, grad_norm=173.760, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.800e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:02:47,325 (trainer:732) INFO: 30epoch:train:359-716batch: iter_time=6.020e-04, forward_time=0.104, loss_ctc=54.540, loss_att=28.117, acc=0.729, loss=36.044, backward_time=0.053, grad_norm=181.573, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.797e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:04:20,392 (trainer:732) INFO: 30epoch:train:717-1074batch: iter_time=3.797e-04, forward_time=0.105, loss_ctc=55.670, loss_att=28.738, acc=0.727, loss=36.817, backward_time=0.053, grad_norm=182.238, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.793e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:05:52,388 (trainer:732) INFO: 30epoch:train:1075-1432batch: iter_time=0.003, forward_time=0.101, loss_ctc=53.435, loss_att=27.679, acc=0.727, loss=35.406, backward_time=0.054, grad_norm=173.178, clip=100.000, loss_scale=803.754, optim_step_time=0.033, optim0_lr0=3.790e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:07:17,243 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:07:23,025 (trainer:732) INFO: 30epoch:train:1433-1790batch: iter_time=4.256e-04, forward_time=0.101, loss_ctc=54.752, loss_att=28.227, acc=0.728, loss=36.185, backward_time=0.054, grad_norm=177.314, clip=100.000, loss_scale=991.014, optim_step_time=0.033, optim0_lr0=3.787e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:08:55,415 (trainer:732) INFO: 30epoch:train:1791-2148batch: iter_time=0.003, forward_time=0.102, loss_ctc=51.190, loss_att=26.358, acc=0.731, loss=33.807, backward_time=0.053, grad_norm=174.386, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.784e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:10:27,765 (trainer:732) INFO: 30epoch:train:2149-2506batch: iter_time=0.004, forward_time=0.102, loss_ctc=52.225, loss_att=26.965, acc=0.729, loss=34.543, backward_time=0.054, grad_norm=173.820, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.780e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:11:29,559 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:12:01,335 (trainer:732) INFO: 30epoch:train:2507-2864batch: iter_time=0.003, forward_time=0.104, loss_ctc=55.580, loss_att=28.799, acc=0.728, loss=36.833, backward_time=0.054, grad_norm=184.417, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.777e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:13:33,851 (trainer:732) INFO: 30epoch:train:2865-3222batch: iter_time=0.003, forward_time=0.102, loss_ctc=53.318, loss_att=27.457, acc=0.733, loss=35.215, backward_time=0.053, grad_norm=176.894, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.774e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:15:06,827 (trainer:732) INFO: 30epoch:train:3223-3580batch: iter_time=0.004, forward_time=0.103, loss_ctc=52.802, loss_att=27.255, acc=0.732, loss=34.919, backward_time=0.053, grad_norm=178.431, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.771e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:16:11,440 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:16:39,995 (trainer:732) INFO: 30epoch:train:3581-3938batch: iter_time=0.003, forward_time=0.103, loss_ctc=52.389, loss_att=27.007, acc=0.732, loss=34.622, backward_time=0.053, grad_norm=182.938, clip=100.000, loss_scale=599.485, optim_step_time=0.033, optim0_lr0=3.768e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:18:13,279 (trainer:732) INFO: 30epoch:train:3939-4296batch: iter_time=0.004, forward_time=0.103, loss_ctc=53.196, loss_att=27.470, acc=0.730, loss=35.188, backward_time=0.054, grad_norm=179.392, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.764e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:19:45,478 (trainer:732) INFO: 30epoch:train:4297-4654batch: iter_time=0.004, forward_time=0.101, loss_ctc=53.170, loss_att=27.411, acc=0.732, loss=35.139, backward_time=0.053, grad_norm=174.758, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.761e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:21:18,671 (trainer:732) INFO: 30epoch:train:4655-5012batch: iter_time=0.005, forward_time=0.102, loss_ctc=51.255, loss_att=26.538, acc=0.733, loss=33.953, backward_time=0.053, grad_norm=180.182, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.758e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:22:50,309 (trainer:732) INFO: 30epoch:train:5013-5370batch: iter_time=0.002, forward_time=0.101, loss_ctc=54.620, loss_att=28.309, acc=0.729, loss=36.202, backward_time=0.053, grad_norm=182.752, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.755e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:23:05,051 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:24:24,007 (trainer:732) INFO: 30epoch:train:5371-5728batch: iter_time=0.008, forward_time=0.101, loss_ctc=50.650, loss_att=26.078, acc=0.735, loss=33.450, backward_time=0.053, grad_norm=174.483, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.752e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:25:37,662 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:25:56,813 (trainer:732) INFO: 30epoch:train:5729-6086batch: iter_time=0.006, forward_time=0.101, loss_ctc=51.683, loss_att=26.773, acc=0.732, loss=34.246, backward_time=0.053, grad_norm=178.247, clip=100.000, loss_scale=775.888, optim_step_time=0.033, optim0_lr0=3.749e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:27:29,358 (trainer:732) INFO: 30epoch:train:6087-6444batch: iter_time=0.003, forward_time=0.102, loss_ctc=55.149, loss_att=28.637, acc=0.726, loss=36.591, backward_time=0.053, grad_norm=187.515, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.746e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:28:33,472 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:29:03,066 (trainer:732) INFO: 30epoch:train:6445-6802batch: iter_time=0.005, forward_time=0.103, loss_ctc=54.224, loss_att=28.000, acc=0.734, loss=35.867, backward_time=0.055, grad_norm=186.971, clip=100.000, loss_scale=431.686, optim_step_time=0.033, optim0_lr0=3.742e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:29:37,899 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:30:37,624 (trainer:732) INFO: 30epoch:train:6803-7160batch: iter_time=0.007, forward_time=0.103, loss_ctc=53.734, loss_att=27.781, acc=0.733, loss=35.567, backward_time=0.053, grad_norm=181.467, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=3.739e-05, train_time=0.264 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:31:46,893 (trainer:338) INFO: 30epoch results: [train] iter_time=0.004, forward_time=0.102, loss_ctc=53.251, loss_att=27.507, acc=0.730, loss=35.230, backward_time=0.053, grad_norm=179.236, clip=100.000, loss_scale=551.167, optim_step_time=0.033, optim0_lr0=3.769e-05, train_time=0.259, time=30 minutes and 56.32 seconds, total_count=214830, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=30.169, cer_ctc=0.175, loss_att=15.125, acc=0.855, cer=0.097, wer=0.868, loss=19.638, time=14.74 seconds, total_count=1590, gpu_max_cached_mem_GB=28.453, [att_plot] time=53.89 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:31:50,692 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:31:50,713 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/20epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:31:50,713 (trainer:272) INFO: 31/100epoch started. Estimated time to finish: 1 day, 13 hours and 36 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:33:21,695 (trainer:732) INFO: 31epoch:train:1-358batch: iter_time=0.003, forward_time=0.099, loss_ctc=53.830, loss_att=27.682, acc=0.732, loss=35.527, backward_time=0.053, grad_norm=176.751, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=3.736e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:34:52,045 (trainer:732) INFO: 31epoch:train:359-716batch: iter_time=0.002, forward_time=0.099, loss_ctc=51.428, loss_att=26.457, acc=0.736, loss=33.948, backward_time=0.053, grad_norm=178.007, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=3.733e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:36:22,972 (trainer:732) INFO: 31epoch:train:717-1074batch: iter_time=0.001, forward_time=0.100, loss_ctc=53.660, loss_att=27.547, acc=0.736, loss=35.381, backward_time=0.053, grad_norm=184.401, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=3.730e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:37:54,436 (trainer:732) INFO: 31epoch:train:1075-1432batch: iter_time=0.002, forward_time=0.100, loss_ctc=52.969, loss_att=27.338, acc=0.735, loss=35.027, backward_time=0.054, grad_norm=182.060, clip=100.000, loss_scale=256.000, optim_step_time=0.034, optim0_lr0=3.727e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:39:25,367 (trainer:732) INFO: 31epoch:train:1433-1790batch: iter_time=0.002, forward_time=0.099, loss_ctc=51.459, loss_att=26.520, acc=0.737, loss=34.002, backward_time=0.053, grad_norm=176.666, clip=100.000, loss_scale=442.637, optim_step_time=0.033, optim0_lr0=3.724e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:40:54,049 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:40:58,347 (trainer:732) INFO: 31epoch:train:1791-2148batch: iter_time=0.006, forward_time=0.100, loss_ctc=53.518, loss_att=27.716, acc=0.735, loss=35.456, backward_time=0.053, grad_norm=182.395, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.721e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:42:30,531 (trainer:732) INFO: 31epoch:train:2149-2506batch: iter_time=8.853e-04, forward_time=0.101, loss_ctc=53.942, loss_att=27.809, acc=0.735, loss=35.649, backward_time=0.054, grad_norm=181.559, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.718e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:44:00,975 (trainer:732) INFO: 31epoch:train:2507-2864batch: iter_time=0.004, forward_time=0.098, loss_ctc=50.490, loss_att=25.921, acc=0.742, loss=33.292, backward_time=0.055, grad_norm=175.198, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.715e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:45:33,543 (trainer:732) INFO: 31epoch:train:2865-3222batch: iter_time=0.004, forward_time=0.100, loss_ctc=51.127, loss_att=26.393, acc=0.740, loss=33.813, backward_time=0.054, grad_norm=174.185, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.712e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:47:04,985 (trainer:732) INFO: 31epoch:train:3223-3580batch: iter_time=0.005, forward_time=0.098, loss_ctc=49.843, loss_att=25.541, acc=0.742, loss=32.831, backward_time=0.055, grad_norm=179.807, clip=100.000, loss_scale=584.939, optim_step_time=0.033, optim0_lr0=3.708e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:47:39,335 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:48:36,297 (trainer:732) INFO: 31epoch:train:3581-3938batch: iter_time=0.002, forward_time=0.099, loss_ctc=53.189, loss_att=27.371, acc=0.737, loss=35.117, backward_time=0.054, grad_norm=179.354, clip=100.000, loss_scale=705.613, optim_step_time=0.033, optim0_lr0=3.705e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:50:08,071 (trainer:732) INFO: 31epoch:train:3939-4296batch: iter_time=0.005, forward_time=0.098, loss_ctc=50.518, loss_att=26.006, acc=0.738, loss=33.360, backward_time=0.054, grad_norm=172.558, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.702e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:51:40,050 (trainer:732) INFO: 31epoch:train:4297-4654batch: iter_time=0.005, forward_time=0.099, loss_ctc=52.445, loss_att=27.087, acc=0.734, loss=34.694, backward_time=0.053, grad_norm=183.479, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=3.699e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:52:46,535 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:53:12,029 (trainer:732) INFO: 31epoch:train:4655-5012batch: iter_time=0.004, forward_time=0.100, loss_ctc=52.676, loss_att=27.234, acc=0.737, loss=34.867, backward_time=0.053, grad_norm=189.876, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.696e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:53:18,096 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:54:44,353 (trainer:732) INFO: 31epoch:train:5013-5370batch: iter_time=0.006, forward_time=0.099, loss_ctc=50.747, loss_att=26.112, acc=0.736, loss=33.503, backward_time=0.053, grad_norm=177.803, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.693e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:56:17,746 (trainer:732) INFO: 31epoch:train:5371-5728batch: iter_time=0.005, forward_time=0.101, loss_ctc=52.326, loss_att=27.003, acc=0.742, loss=34.600, backward_time=0.054, grad_norm=183.253, clip=100.000, loss_scale=529.162, optim_step_time=0.033, optim0_lr0=3.690e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:56:21,358 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:57:51,432 (trainer:732) INFO: 31epoch:train:5729-6086batch: iter_time=0.007, forward_time=0.100, loss_ctc=51.178, loss_att=26.340, acc=0.743, loss=33.791, backward_time=0.053, grad_norm=180.962, clip=100.000, loss_scale=529.210, optim_step_time=0.033, optim0_lr0=3.687e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 09:59:25,334 (trainer:732) INFO: 31epoch:train:6087-6444batch: iter_time=0.009, forward_time=0.100, loss_ctc=51.553, loss_att=26.536, acc=0.740, loss=34.041, backward_time=0.054, grad_norm=184.351, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.684e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:00:59,156 (trainer:732) INFO: 31epoch:train:6445-6802batch: iter_time=0.010, forward_time=0.099, loss_ctc=50.910, loss_att=26.228, acc=0.741, loss=33.632, backward_time=0.054, grad_norm=182.945, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.681e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:02:31,899 (trainer:732) INFO: 31epoch:train:6803-7160batch: iter_time=0.008, forward_time=0.099, loss_ctc=51.843, loss_att=26.747, acc=0.742, loss=34.276, backward_time=0.053, grad_norm=180.657, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.678e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:03:40,886 (trainer:338) INFO: 31epoch results: [train] iter_time=0.005, forward_time=0.099, loss_ctc=51.961, loss_att=26.768, acc=0.738, loss=34.326, backward_time=0.054, grad_norm=180.308, clip=100.000, loss_scale=472.343, optim_step_time=0.033, optim0_lr0=3.707e-05, train_time=0.257, time=30 minutes and 41.83 seconds, total_count=221991, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=28.754, cer_ctc=0.167, loss_att=14.462, acc=0.862, cer=0.091, wer=0.851, loss=18.750, time=14.55 seconds, total_count=1643, gpu_max_cached_mem_GB=28.453, [att_plot] time=53.78 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:03:44,725 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:03:44,743 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/21epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:03:44,744 (trainer:272) INFO: 32/100epoch started. Estimated time to finish: 1 day, 13 hours and 3 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:05:17,743 (trainer:732) INFO: 32epoch:train:1-358batch: iter_time=0.003, forward_time=0.101, loss_ctc=52.986, loss_att=27.162, acc=0.744, loss=34.909, backward_time=0.054, grad_norm=185.235, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.675e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:06:41,005 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:06:47,940 (trainer:732) INFO: 32epoch:train:359-716batch: iter_time=0.002, forward_time=0.098, loss_ctc=50.694, loss_att=26.091, acc=0.741, loss=33.472, backward_time=0.053, grad_norm=173.759, clip=100.000, loss_scale=666.891, optim_step_time=0.033, optim0_lr0=3.672e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:08:18,897 (trainer:732) INFO: 32epoch:train:717-1074batch: iter_time=0.001, forward_time=0.100, loss_ctc=52.591, loss_att=27.053, acc=0.740, loss=34.715, backward_time=0.053, grad_norm=177.671, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.670e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:09:50,448 (trainer:732) INFO: 32epoch:train:1075-1432batch: iter_time=0.002, forward_time=0.100, loss_ctc=50.256, loss_att=25.790, acc=0.745, loss=33.130, backward_time=0.053, grad_norm=177.015, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.667e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:10:49,452 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:11:21,672 (trainer:732) INFO: 32epoch:train:1433-1790batch: iter_time=0.003, forward_time=0.099, loss_ctc=49.757, loss_att=25.411, acc=0.745, loss=32.715, backward_time=0.053, grad_norm=176.045, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.664e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:12:53,025 (trainer:732) INFO: 32epoch:train:1791-2148batch: iter_time=0.002, forward_time=0.100, loss_ctc=50.586, loss_att=25.910, acc=0.746, loss=33.312, backward_time=0.054, grad_norm=178.362, clip=100.000, loss_scale=512.000, optim_step_time=0.034, optim0_lr0=3.661e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:14:23,140 (trainer:732) INFO: 32epoch:train:2149-2506batch: iter_time=0.002, forward_time=0.098, loss_ctc=51.312, loss_att=26.402, acc=0.742, loss=33.875, backward_time=0.053, grad_norm=178.090, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.658e-05, train_time=0.251 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:15:42,591 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:15:54,926 (trainer:732) INFO: 32epoch:train:2507-2864batch: iter_time=0.004, forward_time=0.099, loss_ctc=51.673, loss_att=26.608, acc=0.743, loss=34.128, backward_time=0.053, grad_norm=181.919, clip=100.000, loss_scale=694.140, optim_step_time=0.033, optim0_lr0=3.655e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:16:13,808 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:17:26,733 (trainer:732) INFO: 32epoch:train:2865-3222batch: iter_time=0.004, forward_time=0.100, loss_ctc=49.595, loss_att=25.544, acc=0.746, loss=32.759, backward_time=0.055, grad_norm=179.255, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.652e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:18:59,761 (trainer:732) INFO: 32epoch:train:3223-3580batch: iter_time=0.006, forward_time=0.100, loss_ctc=49.370, loss_att=25.432, acc=0.745, loss=32.613, backward_time=0.056, grad_norm=174.975, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.649e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:20:33,209 (trainer:732) INFO: 32epoch:train:3581-3938batch: iter_time=0.003, forward_time=0.101, loss_ctc=53.295, loss_att=27.470, acc=0.742, loss=35.218, backward_time=0.056, grad_norm=183.685, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.646e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:22:04,186 (trainer:732) INFO: 32epoch:train:3939-4296batch: iter_time=0.002, forward_time=0.099, loss_ctc=51.140, loss_att=26.354, acc=0.743, loss=33.790, backward_time=0.054, grad_norm=176.864, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.643e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:23:36,194 (trainer:732) INFO: 32epoch:train:4297-4654batch: iter_time=0.004, forward_time=0.099, loss_ctc=51.648, loss_att=26.551, acc=0.746, loss=34.080, backward_time=0.054, grad_norm=183.415, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.640e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:24:47,122 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:24:52,900 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:25:09,519 (trainer:732) INFO: 32epoch:train:4655-5012batch: iter_time=0.009, forward_time=0.099, loss_ctc=49.762, loss_att=25.542, acc=0.747, loss=32.808, backward_time=0.053, grad_norm=173.847, clip=100.000, loss_scale=668.325, optim_step_time=0.033, optim0_lr0=3.638e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:26:40,793 (trainer:732) INFO: 32epoch:train:5013-5370batch: iter_time=0.005, forward_time=0.098, loss_ctc=51.491, loss_att=26.591, acc=0.743, loss=34.061, backward_time=0.055, grad_norm=181.097, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.635e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:28:12,849 (trainer:732) INFO: 32epoch:train:5371-5728batch: iter_time=0.005, forward_time=0.099, loss_ctc=50.975, loss_att=26.269, acc=0.745, loss=33.681, backward_time=0.053, grad_norm=178.896, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.632e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:29:47,813 (trainer:732) INFO: 32epoch:train:5729-6086batch: iter_time=0.012, forward_time=0.100, loss_ctc=49.137, loss_att=25.295, acc=0.747, loss=32.448, backward_time=0.053, grad_norm=176.610, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.629e-05, train_time=0.265 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:31:20,825 (trainer:732) INFO: 32epoch:train:6087-6444batch: iter_time=0.009, forward_time=0.098, loss_ctc=49.225, loss_att=25.202, acc=0.750, loss=32.409, backward_time=0.053, grad_norm=176.650, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.626e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:32:52,580 (trainer:732) INFO: 32epoch:train:6445-6802batch: iter_time=0.007, forward_time=0.098, loss_ctc=49.897, loss_att=25.620, acc=0.747, loss=32.903, backward_time=0.053, grad_norm=177.014, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.623e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:34:25,549 (trainer:732) INFO: 32epoch:train:6803-7160batch: iter_time=0.007, forward_time=0.099, loss_ctc=50.185, loss_att=25.836, acc=0.747, loss=33.141, backward_time=0.054, grad_norm=181.836, clip=100.000, loss_scale=845.229, optim_step_time=0.033, optim0_lr0=3.620e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:35:34,686 (trainer:338) INFO: 32epoch results: [train] iter_time=0.005, forward_time=0.099, loss_ctc=50.762, loss_att=26.097, acc=0.745, loss=33.497, backward_time=0.054, grad_norm=178.608, clip=100.000, loss_scale=553.343, optim_step_time=0.033, optim0_lr0=3.648e-05, train_time=0.257, time=30 minutes and 41.55 seconds, total_count=229152, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=27.723, cer_ctc=0.161, loss_att=14.010, acc=0.867, cer=0.088, wer=0.843, loss=18.123, time=14.69 seconds, total_count=1696, gpu_max_cached_mem_GB=28.453, [att_plot] time=53.7 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:35:38,529 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:35:38,549 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/22epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:35:38,549 (trainer:272) INFO: 33/100epoch started. Estimated time to finish: 1 day, 12 hours and 30 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:36:14,279 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:37:10,467 (trainer:732) INFO: 33epoch:train:1-358batch: iter_time=0.003, forward_time=0.100, loss_ctc=51.699, loss_att=26.559, acc=0.748, loss=34.101, backward_time=0.054, grad_norm=184.481, clip=100.000, loss_scale=714.218, optim_step_time=0.033, optim0_lr0=3.618e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:38:41,025 (trainer:732) INFO: 33epoch:train:359-716batch: iter_time=0.003, forward_time=0.098, loss_ctc=49.284, loss_att=25.298, acc=0.747, loss=32.494, backward_time=0.053, grad_norm=174.887, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.615e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:39:49,431 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:40:12,844 (trainer:732) INFO: 33epoch:train:717-1074batch: iter_time=3.059e-04, forward_time=0.101, loss_ctc=53.959, loss_att=27.707, acc=0.746, loss=35.582, backward_time=0.053, grad_norm=187.396, clip=100.000, loss_scale=447.462, optim_step_time=0.033, optim0_lr0=3.612e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:40:22,669 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:41:45,141 (trainer:732) INFO: 33epoch:train:1075-1432batch: iter_time=0.002, forward_time=0.101, loss_ctc=50.866, loss_att=26.103, acc=0.750, loss=33.532, backward_time=0.054, grad_norm=182.325, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=3.609e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:43:15,736 (trainer:732) INFO: 33epoch:train:1433-1790batch: iter_time=0.004, forward_time=0.098, loss_ctc=46.506, loss_att=23.669, acc=0.753, loss=30.520, backward_time=0.053, grad_norm=170.193, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=3.606e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:44:46,411 (trainer:732) INFO: 33epoch:train:1791-2148batch: iter_time=0.002, forward_time=0.098, loss_ctc=50.207, loss_att=25.756, acc=0.751, loss=33.091, backward_time=0.057, grad_norm=183.984, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=3.604e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:46:18,387 (trainer:732) INFO: 33epoch:train:2149-2506batch: iter_time=0.002, forward_time=0.101, loss_ctc=49.126, loss_att=25.258, acc=0.751, loss=32.418, backward_time=0.053, grad_norm=179.870, clip=100.000, loss_scale=256.000, optim_step_time=0.034, optim0_lr0=3.601e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:47:50,713 (trainer:732) INFO: 33epoch:train:2507-2864batch: iter_time=0.003, forward_time=0.102, loss_ctc=48.527, loss_att=24.872, acc=0.753, loss=31.968, backward_time=0.054, grad_norm=177.861, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=3.598e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:48:38,666 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:49:23,003 (trainer:732) INFO: 33epoch:train:2865-3222batch: iter_time=0.003, forward_time=0.101, loss_ctc=51.614, loss_att=26.512, acc=0.749, loss=34.043, backward_time=0.052, grad_norm=189.708, clip=100.000, loss_scale=426.190, optim_step_time=0.033, optim0_lr0=3.595e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:50:57,457 (trainer:732) INFO: 33epoch:train:3223-3580batch: iter_time=0.010, forward_time=0.101, loss_ctc=48.092, loss_att=24.758, acc=0.752, loss=31.758, backward_time=0.053, grad_norm=183.752, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.592e-05, train_time=0.264 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:52:29,777 (trainer:732) INFO: 33epoch:train:3581-3938batch: iter_time=0.004, forward_time=0.101, loss_ctc=48.475, loss_att=24.825, acc=0.753, loss=31.920, backward_time=0.054, grad_norm=182.199, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.590e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:54:01,131 (trainer:732) INFO: 33epoch:train:3939-4296batch: iter_time=0.006, forward_time=0.098, loss_ctc=47.346, loss_att=24.361, acc=0.752, loss=31.256, backward_time=0.053, grad_norm=172.610, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.587e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:55:07,725 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:55:34,074 (trainer:732) INFO: 33epoch:train:4297-4654batch: iter_time=0.005, forward_time=0.101, loss_ctc=49.876, loss_att=25.719, acc=0.750, loss=32.966, backward_time=0.053, grad_norm=187.562, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.584e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:57:05,782 (trainer:732) INFO: 33epoch:train:4655-5012batch: iter_time=0.004, forward_time=0.100, loss_ctc=49.364, loss_att=25.285, acc=0.753, loss=32.509, backward_time=0.053, grad_norm=176.346, clip=100.000, loss_scale=552.045, optim_step_time=0.034, optim0_lr0=3.581e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:58:22,160 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 10:58:39,599 (trainer:732) INFO: 33epoch:train:5013-5370batch: iter_time=0.004, forward_time=0.103, loss_ctc=51.492, loss_att=26.436, acc=0.750, loss=33.953, backward_time=0.053, grad_norm=184.725, clip=100.000, loss_scale=930.779, optim_step_time=0.033, optim0_lr0=3.579e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:00:11,845 (trainer:732) INFO: 33epoch:train:5371-5728batch: iter_time=0.006, forward_time=0.100, loss_ctc=49.218, loss_att=25.268, acc=0.753, loss=32.453, backward_time=0.053, grad_norm=178.712, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.576e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:01:46,110 (trainer:732) INFO: 33epoch:train:5729-6086batch: iter_time=0.009, forward_time=0.101, loss_ctc=49.183, loss_att=25.218, acc=0.754, loss=32.407, backward_time=0.053, grad_norm=178.223, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.573e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:03:18,589 (trainer:732) INFO: 33epoch:train:6087-6444batch: iter_time=0.006, forward_time=0.100, loss_ctc=50.126, loss_att=25.744, acc=0.751, loss=33.059, backward_time=0.053, grad_norm=181.263, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.571e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:04:28,488 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:04:52,467 (trainer:732) INFO: 33epoch:train:6445-6802batch: iter_time=0.007, forward_time=0.101, loss_ctc=49.154, loss_att=25.108, acc=0.754, loss=32.322, backward_time=0.052, grad_norm=177.915, clip=100.000, loss_scale=446.028, optim_step_time=0.033, optim0_lr0=3.568e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:06:25,915 (trainer:732) INFO: 33epoch:train:6803-7160batch: iter_time=0.007, forward_time=0.101, loss_ctc=49.559, loss_att=25.440, acc=0.754, loss=32.675, backward_time=0.053, grad_norm=183.248, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=3.565e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:07:35,029 (trainer:338) INFO: 33epoch results: [train] iter_time=0.004, forward_time=0.100, loss_ctc=49.649, loss_att=25.476, acc=0.751, loss=32.728, backward_time=0.053, grad_norm=180.859, clip=100.000, loss_scale=457.309, optim_step_time=0.033, optim0_lr0=3.591e-05, train_time=0.258, time=30 minutes and 48.02 seconds, total_count=236313, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=26.913, cer_ctc=0.154, loss_att=13.563, acc=0.870, cer=0.085, wer=0.836, loss=17.568, time=14.49 seconds, total_count=1749, gpu_max_cached_mem_GB=28.453, [att_plot] time=53.96 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:07:38,869 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:07:38,890 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/23epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:07:38,891 (trainer:272) INFO: 34/100epoch started. Estimated time to finish: 1 day, 11 hours and 57 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:09:13,205 (trainer:732) INFO: 34epoch:train:1-358batch: iter_time=0.003, forward_time=0.106, loss_ctc=49.205, loss_att=25.114, acc=0.756, loss=32.341, backward_time=0.054, grad_norm=178.947, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=3.562e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:10:46,849 (trainer:732) INFO: 34epoch:train:359-716batch: iter_time=6.936e-04, forward_time=0.105, loss_ctc=50.644, loss_att=25.866, acc=0.755, loss=33.300, backward_time=0.054, grad_norm=184.731, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=3.560e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:12:20,310 (trainer:732) INFO: 34epoch:train:717-1074batch: iter_time=0.004, forward_time=0.102, loss_ctc=49.264, loss_att=25.143, acc=0.755, loss=32.380, backward_time=0.054, grad_norm=181.656, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=3.557e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:13:50,723 (trainer:732) INFO: 34epoch:train:1075-1432batch: iter_time=0.001, forward_time=0.100, loss_ctc=49.121, loss_att=25.230, acc=0.754, loss=32.398, backward_time=0.053, grad_norm=175.825, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=3.554e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:15:22,736 (trainer:732) INFO: 34epoch:train:1433-1790batch: iter_time=0.002, forward_time=0.102, loss_ctc=46.952, loss_att=23.866, acc=0.762, loss=30.791, backward_time=0.055, grad_norm=171.950, clip=100.000, loss_scale=428.335, optim_step_time=0.033, optim0_lr0=3.552e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:16:54,451 (trainer:732) INFO: 34epoch:train:1791-2148batch: iter_time=0.003, forward_time=0.101, loss_ctc=47.892, loss_att=24.420, acc=0.756, loss=31.462, backward_time=0.055, grad_norm=178.949, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.549e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:18:02,107 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:18:25,351 (trainer:732) INFO: 34epoch:train:2149-2506batch: iter_time=4.536e-04, forward_time=0.101, loss_ctc=48.465, loss_att=24.830, acc=0.759, loss=31.920, backward_time=0.053, grad_norm=185.485, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.546e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:19:57,250 (trainer:732) INFO: 34epoch:train:2507-2864batch: iter_time=0.001, forward_time=0.101, loss_ctc=49.318, loss_att=25.292, acc=0.754, loss=32.500, backward_time=0.053, grad_norm=182.862, clip=100.000, loss_scale=512.000, optim_step_time=0.034, optim0_lr0=3.544e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:21:29,535 (trainer:732) INFO: 34epoch:train:2865-3222batch: iter_time=0.005, forward_time=0.100, loss_ctc=47.786, loss_att=24.524, acc=0.756, loss=31.503, backward_time=0.053, grad_norm=179.679, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.541e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:23:01,567 (trainer:732) INFO: 34epoch:train:3223-3580batch: iter_time=0.005, forward_time=0.100, loss_ctc=49.437, loss_att=25.263, acc=0.758, loss=32.515, backward_time=0.053, grad_norm=181.397, clip=100.000, loss_scale=556.335, optim_step_time=0.033, optim0_lr0=3.538e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:24:08,043 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:24:33,849 (trainer:732) INFO: 34epoch:train:3581-3938batch: iter_time=0.003, forward_time=0.103, loss_ctc=48.795, loss_att=24.990, acc=0.754, loss=32.131, backward_time=0.053, grad_norm=180.697, clip=100.000, loss_scale=882.017, optim_step_time=0.033, optim0_lr0=3.536e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:26:06,598 (trainer:732) INFO: 34epoch:train:3939-4296batch: iter_time=0.003, forward_time=0.102, loss_ctc=50.686, loss_att=26.028, acc=0.753, loss=33.425, backward_time=0.053, grad_norm=181.243, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.533e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:27:39,306 (trainer:732) INFO: 34epoch:train:4297-4654batch: iter_time=0.004, forward_time=0.101, loss_ctc=48.464, loss_att=24.925, acc=0.757, loss=31.987, backward_time=0.054, grad_norm=180.732, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.530e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:29:13,174 (trainer:732) INFO: 34epoch:train:4655-5012batch: iter_time=0.008, forward_time=0.102, loss_ctc=46.264, loss_att=23.640, acc=0.760, loss=30.428, backward_time=0.054, grad_norm=174.064, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.528e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:30:46,726 (trainer:732) INFO: 34epoch:train:5013-5370batch: iter_time=0.004, forward_time=0.102, loss_ctc=50.268, loss_att=25.746, acc=0.755, loss=33.103, backward_time=0.053, grad_norm=181.218, clip=100.000, loss_scale=512.000, optim_step_time=0.034, optim0_lr0=3.525e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:32:19,415 (trainer:732) INFO: 34epoch:train:5371-5728batch: iter_time=0.005, forward_time=0.100, loss_ctc=49.728, loss_att=25.533, acc=0.757, loss=32.791, backward_time=0.054, grad_norm=180.010, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.523e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:33:51,706 (trainer:732) INFO: 34epoch:train:5729-6086batch: iter_time=0.003, forward_time=0.100, loss_ctc=49.938, loss_att=25.575, acc=0.758, loss=32.884, backward_time=0.054, grad_norm=183.093, clip=100.000, loss_scale=865.251, optim_step_time=0.034, optim0_lr0=3.520e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:34:17,740 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:35:17,483 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:35:24,560 (trainer:732) INFO: 34epoch:train:6087-6444batch: iter_time=0.005, forward_time=0.101, loss_ctc=48.104, loss_att=24.662, acc=0.757, loss=31.695, backward_time=0.053, grad_norm=179.398, clip=100.000, loss_scale=652.549, optim_step_time=0.033, optim0_lr0=3.517e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:36:45,557 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:36:58,316 (trainer:732) INFO: 34epoch:train:6445-6802batch: iter_time=0.009, forward_time=0.100, loss_ctc=45.767, loss_att=23.447, acc=0.760, loss=30.143, backward_time=0.054, grad_norm=177.603, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.515e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:38:31,160 (trainer:732) INFO: 34epoch:train:6803-7160batch: iter_time=0.007, forward_time=0.100, loss_ctc=48.265, loss_att=24.808, acc=0.757, loss=31.845, backward_time=0.053, grad_norm=185.694, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.512e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:39:40,068 (trainer:338) INFO: 34epoch results: [train] iter_time=0.004, forward_time=0.101, loss_ctc=48.690, loss_att=24.930, acc=0.757, loss=32.058, backward_time=0.054, grad_norm=180.261, clip=100.000, loss_scale=501.952, optim_step_time=0.033, optim0_lr0=3.537e-05, train_time=0.258, time=30 minutes and 52.96 seconds, total_count=243474, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=25.908, cer_ctc=0.150, loss_att=13.150, acc=0.874, cer=0.083, wer=0.829, loss=16.977, time=14.66 seconds, total_count=1802, gpu_max_cached_mem_GB=28.453, [att_plot] time=53.56 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:39:43,697 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:39:43,718 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/24epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:39:43,718 (trainer:272) INFO: 35/100epoch started. Estimated time to finish: 1 day, 11 hours and 25 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:41:14,905 (trainer:732) INFO: 35epoch:train:1-358batch: iter_time=0.003, forward_time=0.099, loss_ctc=48.395, loss_att=24.728, acc=0.761, loss=31.828, backward_time=0.052, grad_norm=183.828, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.510e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:42:18,013 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:42:46,197 (trainer:732) INFO: 35epoch:train:359-716batch: iter_time=4.901e-04, forward_time=0.101, loss_ctc=47.787, loss_att=24.466, acc=0.759, loss=31.462, backward_time=0.053, grad_norm=179.075, clip=100.000, loss_scale=433.120, optim_step_time=0.033, optim0_lr0=3.507e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:44:19,654 (trainer:732) INFO: 35epoch:train:717-1074batch: iter_time=0.002, forward_time=0.104, loss_ctc=48.604, loss_att=24.778, acc=0.761, loss=31.925, backward_time=0.053, grad_norm=189.429, clip=100.000, loss_scale=256.000, optim_step_time=0.034, optim0_lr0=3.505e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:45:51,833 (trainer:732) INFO: 35epoch:train:1075-1432batch: iter_time=0.003, forward_time=0.101, loss_ctc=48.338, loss_att=24.615, acc=0.762, loss=31.732, backward_time=0.054, grad_norm=185.728, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=3.502e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:46:32,650 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:47:24,477 (trainer:732) INFO: 35epoch:train:1433-1790batch: iter_time=0.001, forward_time=0.102, loss_ctc=50.163, loss_att=25.714, acc=0.758, loss=33.048, backward_time=0.054, grad_norm=190.295, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=3.499e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:48:16,058 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:48:56,245 (trainer:732) INFO: 35epoch:train:1791-2148batch: iter_time=0.002, forward_time=0.101, loss_ctc=47.630, loss_att=24.318, acc=0.763, loss=31.312, backward_time=0.055, grad_norm=178.968, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=3.497e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:50:28,093 (trainer:732) INFO: 35epoch:train:2149-2506batch: iter_time=0.002, forward_time=0.101, loss_ctc=46.844, loss_att=24.002, acc=0.761, loss=30.854, backward_time=0.054, grad_norm=177.523, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=3.494e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:51:59,490 (trainer:732) INFO: 35epoch:train:2507-2864batch: iter_time=0.005, forward_time=0.099, loss_ctc=44.927, loss_att=22.975, acc=0.763, loss=29.561, backward_time=0.054, grad_norm=172.813, clip=100.000, loss_scale=440.492, optim_step_time=0.033, optim0_lr0=3.492e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:53:31,569 (trainer:732) INFO: 35epoch:train:2865-3222batch: iter_time=0.005, forward_time=0.100, loss_ctc=47.984, loss_att=24.525, acc=0.761, loss=31.563, backward_time=0.053, grad_norm=179.477, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.489e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:55:04,878 (trainer:732) INFO: 35epoch:train:3223-3580batch: iter_time=0.003, forward_time=0.102, loss_ctc=49.069, loss_att=25.135, acc=0.761, loss=32.315, backward_time=0.053, grad_norm=179.799, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.487e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:56:35,243 (trainer:732) INFO: 35epoch:train:3581-3938batch: iter_time=0.003, forward_time=0.099, loss_ctc=47.009, loss_att=24.059, acc=0.762, loss=30.944, backward_time=0.053, grad_norm=185.378, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.484e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:58:06,903 (trainer:732) INFO: 35epoch:train:3939-4296batch: iter_time=0.004, forward_time=0.100, loss_ctc=45.936, loss_att=23.435, acc=0.766, loss=30.185, backward_time=0.053, grad_norm=175.903, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.482e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:59:34,155 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 11:59:40,858 (trainer:732) INFO: 35epoch:train:4297-4654batch: iter_time=0.005, forward_time=0.102, loss_ctc=47.358, loss_att=24.160, acc=0.762, loss=31.120, backward_time=0.053, grad_norm=176.913, clip=100.000, loss_scale=547.854, optim_step_time=0.034, optim0_lr0=3.479e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:01:14,127 (trainer:732) INFO: 35epoch:train:4655-5012batch: iter_time=0.006, forward_time=0.101, loss_ctc=47.375, loss_att=24.167, acc=0.764, loss=31.130, backward_time=0.054, grad_norm=180.931, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.477e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:02:46,086 (trainer:732) INFO: 35epoch:train:5013-5370batch: iter_time=0.005, forward_time=0.099, loss_ctc=47.846, loss_att=24.498, acc=0.762, loss=31.503, backward_time=0.053, grad_norm=181.048, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.474e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:04:18,566 (trainer:732) INFO: 35epoch:train:5371-5728batch: iter_time=0.004, forward_time=0.100, loss_ctc=47.629, loss_att=24.334, acc=0.763, loss=31.322, backward_time=0.054, grad_norm=181.638, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.472e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:05:22,515 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:05:53,515 (trainer:732) INFO: 35epoch:train:5729-6086batch: iter_time=0.009, forward_time=0.101, loss_ctc=48.671, loss_att=24.998, acc=0.760, loss=32.100, backward_time=0.054, grad_norm=180.009, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.469e-05, train_time=0.265 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:07:27,672 (trainer:732) INFO: 35epoch:train:6087-6444batch: iter_time=0.003, forward_time=0.105, loss_ctc=48.968, loss_att=25.053, acc=0.762, loss=32.228, backward_time=0.053, grad_norm=184.980, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.467e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:08:23,920 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:09:01,030 (trainer:732) INFO: 35epoch:train:6445-6802batch: iter_time=0.005, forward_time=0.103, loss_ctc=48.455, loss_att=24.871, acc=0.759, loss=31.946, backward_time=0.053, grad_norm=185.258, clip=100.000, loss_scale=549.289, optim_step_time=0.033, optim0_lr0=3.464e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:10:35,062 (trainer:732) INFO: 35epoch:train:6803-7160batch: iter_time=0.009, forward_time=0.101, loss_ctc=45.346, loss_att=23.147, acc=0.767, loss=29.806, backward_time=0.053, grad_norm=172.695, clip=100.000, loss_scale=512.000, optim_step_time=0.034, optim0_lr0=3.462e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:11:44,473 (trainer:338) INFO: 35epoch results: [train] iter_time=0.004, forward_time=0.101, loss_ctc=47.693, loss_att=24.386, acc=0.762, loss=31.378, backward_time=0.053, grad_norm=181.077, clip=100.000, loss_scale=444.120, optim_step_time=0.033, optim0_lr0=3.486e-05, train_time=0.258, time=30 minutes and 51.98 seconds, total_count=250635, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=25.327, cer_ctc=0.144, loss_att=12.843, acc=0.876, cer=0.080, wer=0.828, loss=16.588, time=14.58 seconds, total_count=1855, gpu_max_cached_mem_GB=28.453, [att_plot] time=54.19 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:11:48,129 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:11:48,148 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/25epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:11:48,148 (trainer:272) INFO: 36/100epoch started. Estimated time to finish: 1 day, 10 hours and 52 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:13:16,116 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:13:20,827 (trainer:732) INFO: 36epoch:train:1-358batch: iter_time=0.003, forward_time=0.102, loss_ctc=47.821, loss_att=24.391, acc=0.764, loss=31.420, backward_time=0.053, grad_norm=181.460, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.459e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:14:53,187 (trainer:732) INFO: 36epoch:train:359-716batch: iter_time=6.946e-04, forward_time=0.104, loss_ctc=48.832, loss_att=24.875, acc=0.764, loss=32.062, backward_time=0.052, grad_norm=180.077, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.457e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:16:25,086 (trainer:732) INFO: 36epoch:train:717-1074batch: iter_time=0.004, forward_time=0.101, loss_ctc=44.239, loss_att=22.538, acc=0.767, loss=29.048, backward_time=0.054, grad_norm=172.040, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.454e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:17:58,063 (trainer:732) INFO: 36epoch:train:1075-1432batch: iter_time=5.357e-04, forward_time=0.103, loss_ctc=47.612, loss_att=24.341, acc=0.766, loss=31.322, backward_time=0.053, grad_norm=184.729, clip=100.000, loss_scale=512.000, optim_step_time=0.035, optim0_lr0=3.452e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:18:20,122 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:19:28,713 (trainer:732) INFO: 36epoch:train:1433-1790batch: iter_time=0.001, forward_time=0.100, loss_ctc=47.239, loss_att=24.110, acc=0.766, loss=31.049, backward_time=0.054, grad_norm=177.545, clip=100.000, loss_scale=540.683, optim_step_time=0.033, optim0_lr0=3.449e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:21:01,806 (trainer:732) INFO: 36epoch:train:1791-2148batch: iter_time=0.001, forward_time=0.105, loss_ctc=47.512, loss_att=24.231, acc=0.767, loss=31.215, backward_time=0.053, grad_norm=188.494, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.447e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:22:31,909 (trainer:732) INFO: 36epoch:train:2149-2506batch: iter_time=0.002, forward_time=0.099, loss_ctc=45.593, loss_att=23.226, acc=0.769, loss=29.936, backward_time=0.054, grad_norm=177.118, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.444e-05, train_time=0.251 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:24:04,274 (trainer:732) INFO: 36epoch:train:2507-2864batch: iter_time=0.004, forward_time=0.101, loss_ctc=46.319, loss_att=23.643, acc=0.766, loss=30.446, backward_time=0.053, grad_norm=182.048, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.442e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:25:35,681 (trainer:732) INFO: 36epoch:train:2865-3222batch: iter_time=0.002, forward_time=0.100, loss_ctc=48.379, loss_att=24.765, acc=0.766, loss=31.849, backward_time=0.053, grad_norm=186.579, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.440e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:27:07,954 (trainer:732) INFO: 36epoch:train:3223-3580batch: iter_time=0.004, forward_time=0.101, loss_ctc=47.972, loss_att=24.565, acc=0.761, loss=31.587, backward_time=0.053, grad_norm=186.112, clip=100.000, loss_scale=599.240, optim_step_time=0.033, optim0_lr0=3.437e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:27:20,172 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:28:40,413 (trainer:732) INFO: 36epoch:train:3581-3938batch: iter_time=0.004, forward_time=0.101, loss_ctc=46.959, loss_att=23.943, acc=0.768, loss=30.847, backward_time=0.054, grad_norm=179.344, clip=100.000, loss_scale=577.972, optim_step_time=0.033, optim0_lr0=3.435e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:30:14,945 (trainer:732) INFO: 36epoch:train:3939-4296batch: iter_time=0.004, forward_time=0.104, loss_ctc=47.078, loss_att=24.087, acc=0.764, loss=30.984, backward_time=0.054, grad_norm=182.842, clip=100.000, loss_scale=512.000, optim_step_time=0.034, optim0_lr0=3.432e-05, train_time=0.264 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:31:48,861 (trainer:732) INFO: 36epoch:train:4297-4654batch: iter_time=0.003, forward_time=0.103, loss_ctc=48.944, loss_att=25.124, acc=0.765, loss=32.270, backward_time=0.053, grad_norm=192.170, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.430e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:33:22,670 (trainer:732) INFO: 36epoch:train:4655-5012batch: iter_time=0.007, forward_time=0.102, loss_ctc=46.914, loss_att=23.933, acc=0.767, loss=30.828, backward_time=0.053, grad_norm=179.379, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.428e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:34:54,754 (trainer:732) INFO: 36epoch:train:5013-5370batch: iter_time=0.004, forward_time=0.101, loss_ctc=45.635, loss_att=23.318, acc=0.768, loss=30.013, backward_time=0.055, grad_norm=177.213, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.425e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:35:05,192 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:36:26,948 (trainer:732) INFO: 36epoch:train:5371-5728batch: iter_time=0.005, forward_time=0.100, loss_ctc=46.069, loss_att=23.584, acc=0.767, loss=30.330, backward_time=0.053, grad_norm=180.310, clip=100.000, loss_scale=656.447, optim_step_time=0.033, optim0_lr0=3.423e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:37:03,932 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:37:57,927 (trainer:732) INFO: 36epoch:train:5729-6086batch: iter_time=0.005, forward_time=0.099, loss_ctc=46.226, loss_att=23.585, acc=0.770, loss=30.377, backward_time=0.053, grad_norm=181.831, clip=100.000, loss_scale=718.521, optim_step_time=0.033, optim0_lr0=3.420e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:38:40,877 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:39:31,299 (trainer:732) INFO: 36epoch:train:6087-6444batch: iter_time=0.007, forward_time=0.101, loss_ctc=45.328, loss_att=23.194, acc=0.769, loss=29.834, backward_time=0.053, grad_norm=184.522, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.418e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:41:03,699 (trainer:732) INFO: 36epoch:train:6445-6802batch: iter_time=0.004, forward_time=0.101, loss_ctc=46.852, loss_att=23.877, acc=0.768, loss=30.769, backward_time=0.054, grad_norm=186.502, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.416e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:42:37,138 (trainer:732) INFO: 36epoch:train:6803-7160batch: iter_time=0.005, forward_time=0.102, loss_ctc=45.731, loss_att=23.312, acc=0.771, loss=30.038, backward_time=0.054, grad_norm=180.212, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.413e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:43:46,260 (trainer:338) INFO: 36epoch results: [train] iter_time=0.003, forward_time=0.101, loss_ctc=46.844, loss_att=23.922, acc=0.767, loss=30.799, backward_time=0.053, grad_norm=182.019, clip=100.000, loss_scale=538.609, optim_step_time=0.033, optim0_lr0=3.436e-05, train_time=0.258, time=30 minutes and 49.64 seconds, total_count=257796, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=24.578, cer_ctc=0.139, loss_att=12.510, acc=0.879, cer=0.079, wer=0.815, loss=16.130, time=14.76 seconds, total_count=1908, gpu_max_cached_mem_GB=28.453, [att_plot] time=53.7 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:43:49,936 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:43:49,958 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/26epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:43:49,959 (trainer:272) INFO: 37/100epoch started. Estimated time to finish: 1 day, 10 hours and 20 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:45:21,961 (trainer:732) INFO: 37epoch:train:1-358batch: iter_time=0.003, forward_time=0.101, loss_ctc=44.936, loss_att=22.825, acc=0.772, loss=29.459, backward_time=0.053, grad_norm=178.764, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.411e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:46:53,356 (trainer:732) INFO: 37epoch:train:359-716batch: iter_time=0.003, forward_time=0.100, loss_ctc=46.607, loss_att=23.829, acc=0.768, loss=30.663, backward_time=0.053, grad_norm=183.860, clip=100.000, loss_scale=517.721, optim_step_time=0.033, optim0_lr0=3.408e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:46:58,731 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:48:25,143 (trainer:732) INFO: 37epoch:train:717-1074batch: iter_time=0.002, forward_time=0.101, loss_ctc=45.187, loss_att=22.953, acc=0.774, loss=29.623, backward_time=0.054, grad_norm=183.421, clip=100.000, loss_scale=540.683, optim_step_time=0.033, optim0_lr0=3.406e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:49:56,580 (trainer:732) INFO: 37epoch:train:1075-1432batch: iter_time=0.002, forward_time=0.100, loss_ctc=46.832, loss_att=23.856, acc=0.772, loss=30.749, backward_time=0.053, grad_norm=184.722, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.404e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:51:27,902 (trainer:732) INFO: 37epoch:train:1433-1790batch: iter_time=0.001, forward_time=0.100, loss_ctc=47.202, loss_att=24.074, acc=0.769, loss=31.013, backward_time=0.053, grad_norm=186.361, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.401e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:52:12,066 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:52:13,163 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:52:59,858 (trainer:732) INFO: 37epoch:train:1791-2148batch: iter_time=0.002, forward_time=0.101, loss_ctc=46.120, loss_att=23.545, acc=0.770, loss=30.318, backward_time=0.054, grad_norm=183.208, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.399e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:54:31,254 (trainer:732) INFO: 37epoch:train:2149-2506batch: iter_time=0.002, forward_time=0.100, loss_ctc=47.563, loss_att=24.414, acc=0.765, loss=31.358, backward_time=0.053, grad_norm=183.174, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.397e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:56:01,933 (trainer:732) INFO: 37epoch:train:2507-2864batch: iter_time=0.004, forward_time=0.098, loss_ctc=45.551, loss_att=23.180, acc=0.772, loss=29.891, backward_time=0.053, grad_norm=176.046, clip=100.000, loss_scale=693.631, optim_step_time=0.033, optim0_lr0=3.394e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:57:34,839 (trainer:732) INFO: 37epoch:train:2865-3222batch: iter_time=0.003, forward_time=0.101, loss_ctc=48.041, loss_att=24.632, acc=0.766, loss=31.655, backward_time=0.054, grad_norm=184.374, clip=100.000, loss_scale=1.024e+03, optim_step_time=0.033, optim0_lr0=3.392e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:59:06,451 (trainer:732) INFO: 37epoch:train:3223-3580batch: iter_time=0.004, forward_time=0.099, loss_ctc=46.453, loss_att=23.674, acc=0.771, loss=30.508, backward_time=0.055, grad_norm=179.211, clip=100.000, loss_scale=1.024e+03, optim_step_time=0.033, optim0_lr0=3.390e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 12:59:44,335 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:00:38,915 (trainer:732) INFO: 37epoch:train:3581-3938batch: iter_time=0.002, forward_time=0.101, loss_ctc=47.879, loss_att=24.418, acc=0.769, loss=31.457, backward_time=0.054, grad_norm=182.215, clip=100.000, loss_scale=719.955, optim_step_time=0.033, optim0_lr0=3.387e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:02:11,194 (trainer:732) INFO: 37epoch:train:3939-4296batch: iter_time=0.006, forward_time=0.099, loss_ctc=45.540, loss_att=23.120, acc=0.774, loss=29.846, backward_time=0.053, grad_norm=180.098, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.385e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:03:43,724 (trainer:732) INFO: 37epoch:train:4297-4654batch: iter_time=0.007, forward_time=0.099, loss_ctc=44.270, loss_att=22.468, acc=0.774, loss=29.009, backward_time=0.053, grad_norm=178.287, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.383e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:05:16,621 (trainer:732) INFO: 37epoch:train:4655-5012batch: iter_time=0.010, forward_time=0.098, loss_ctc=45.477, loss_att=23.128, acc=0.771, loss=29.833, backward_time=0.053, grad_norm=184.447, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.381e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:06:48,304 (trainer:732) INFO: 37epoch:train:5013-5370batch: iter_time=0.002, forward_time=0.100, loss_ctc=48.251, loss_att=24.716, acc=0.767, loss=31.777, backward_time=0.053, grad_norm=187.895, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.378e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:08:20,503 (trainer:732) INFO: 37epoch:train:5371-5728batch: iter_time=0.008, forward_time=0.098, loss_ctc=44.181, loss_att=22.525, acc=0.775, loss=29.022, backward_time=0.053, grad_norm=176.534, clip=100.000, loss_scale=514.860, optim_step_time=0.033, optim0_lr0=3.376e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:09:03,572 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:09:08,463 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:09:52,588 (trainer:732) INFO: 37epoch:train:5729-6086batch: iter_time=0.006, forward_time=0.099, loss_ctc=45.720, loss_att=23.315, acc=0.771, loss=30.036, backward_time=0.053, grad_norm=178.546, clip=100.000, loss_scale=750.073, optim_step_time=0.033, optim0_lr0=3.374e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:11:25,934 (trainer:732) INFO: 37epoch:train:6087-6444batch: iter_time=0.009, forward_time=0.099, loss_ctc=44.214, loss_att=22.506, acc=0.775, loss=29.018, backward_time=0.055, grad_norm=178.740, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.371e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:12:58,967 (trainer:732) INFO: 37epoch:train:6445-6802batch: iter_time=0.008, forward_time=0.099, loss_ctc=44.999, loss_att=22.862, acc=0.773, loss=29.503, backward_time=0.055, grad_norm=173.264, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.369e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:14:31,413 (trainer:732) INFO: 37epoch:train:6803-7160batch: iter_time=0.005, forward_time=0.100, loss_ctc=47.346, loss_att=24.276, acc=0.769, loss=31.197, backward_time=0.053, grad_norm=185.153, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.367e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:15:41,070 (trainer:338) INFO: 37epoch results: [train] iter_time=0.005, forward_time=0.099, loss_ctc=46.093, loss_att=23.502, acc=0.771, loss=30.279, backward_time=0.054, grad_norm=181.422, clip=100.000, loss_scale=596.403, optim_step_time=0.033, optim0_lr0=3.389e-05, train_time=0.257, time=30 minutes and 42.13 seconds, total_count=264957, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=23.963, cer_ctc=0.136, loss_att=12.185, acc=0.883, cer=0.076, wer=0.799, loss=15.719, time=15.43 seconds, total_count=1961, gpu_max_cached_mem_GB=28.453, [att_plot] time=53.55 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:15:44,947 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:15:44,968 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/27epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:15:44,969 (trainer:272) INFO: 38/100epoch started. Estimated time to finish: 1 day, 9 hours and 47 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:17:16,438 (trainer:732) INFO: 38epoch:train:1-358batch: iter_time=0.002, forward_time=0.101, loss_ctc=46.428, loss_att=23.596, acc=0.773, loss=30.446, backward_time=0.053, grad_norm=178.790, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.364e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:18:47,302 (trainer:732) INFO: 38epoch:train:359-716batch: iter_time=0.002, forward_time=0.100, loss_ctc=44.019, loss_att=22.297, acc=0.777, loss=28.813, backward_time=0.053, grad_norm=178.888, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.362e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:18:52,013 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:20:18,379 (trainer:732) INFO: 38epoch:train:717-1074batch: iter_time=0.002, forward_time=0.100, loss_ctc=44.645, loss_att=22.698, acc=0.774, loss=29.282, backward_time=0.053, grad_norm=180.759, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.360e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:21:50,537 (trainer:732) INFO: 38epoch:train:1075-1432batch: iter_time=0.003, forward_time=0.101, loss_ctc=44.182, loss_att=22.517, acc=0.773, loss=29.017, backward_time=0.055, grad_norm=180.416, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.358e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:23:21,045 (trainer:732) INFO: 38epoch:train:1433-1790batch: iter_time=0.002, forward_time=0.099, loss_ctc=45.003, loss_att=22.888, acc=0.774, loss=29.522, backward_time=0.054, grad_norm=177.401, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.355e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:24:53,427 (trainer:732) INFO: 38epoch:train:1791-2148batch: iter_time=0.004, forward_time=0.100, loss_ctc=46.706, loss_att=23.780, acc=0.773, loss=30.658, backward_time=0.055, grad_norm=188.252, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.353e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:25:23,241 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:26:24,482 (trainer:732) INFO: 38epoch:train:2149-2506batch: iter_time=0.002, forward_time=0.099, loss_ctc=46.596, loss_att=23.752, acc=0.771, loss=30.605, backward_time=0.054, grad_norm=184.631, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.351e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:27:55,745 (trainer:732) INFO: 38epoch:train:2507-2864batch: iter_time=0.004, forward_time=0.100, loss_ctc=43.941, loss_att=22.395, acc=0.775, loss=28.859, backward_time=0.053, grad_norm=176.287, clip=100.000, loss_scale=696.492, optim_step_time=0.033, optim0_lr0=3.349e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:27:59,365 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:28:48,429 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:29:28,080 (trainer:732) INFO: 38epoch:train:2865-3222batch: iter_time=0.003, forward_time=0.103, loss_ctc=44.246, loss_att=22.496, acc=0.780, loss=29.021, backward_time=0.053, grad_norm=181.353, clip=100.000, loss_scale=532.078, optim_step_time=0.033, optim0_lr0=3.346e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:31:01,155 (trainer:732) INFO: 38epoch:train:3223-3580batch: iter_time=0.003, forward_time=0.103, loss_ctc=45.646, loss_att=23.308, acc=0.775, loss=30.010, backward_time=0.053, grad_norm=186.379, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.344e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:32:34,791 (trainer:732) INFO: 38epoch:train:3581-3938batch: iter_time=0.006, forward_time=0.101, loss_ctc=44.503, loss_att=22.787, acc=0.776, loss=29.302, backward_time=0.055, grad_norm=186.348, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.342e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:34:06,257 (trainer:732) INFO: 38epoch:train:3939-4296batch: iter_time=0.002, forward_time=0.100, loss_ctc=45.951, loss_att=23.399, acc=0.774, loss=30.164, backward_time=0.054, grad_norm=181.863, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.340e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:35:21,291 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:35:38,716 (trainer:732) INFO: 38epoch:train:4297-4654batch: iter_time=0.003, forward_time=0.101, loss_ctc=46.090, loss_att=23.531, acc=0.775, loss=30.299, backward_time=0.055, grad_norm=184.375, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.338e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:36:43,990 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:37:10,889 (trainer:732) INFO: 38epoch:train:4655-5012batch: iter_time=0.002, forward_time=0.101, loss_ctc=47.220, loss_att=24.106, acc=0.774, loss=31.040, backward_time=0.053, grad_norm=183.905, clip=100.000, loss_scale=547.854, optim_step_time=0.033, optim0_lr0=3.335e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:38:43,885 (trainer:732) INFO: 38epoch:train:5013-5370batch: iter_time=0.004, forward_time=0.101, loss_ctc=46.779, loss_att=23.813, acc=0.776, loss=30.703, backward_time=0.053, grad_norm=185.225, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.333e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:40:17,519 (trainer:732) INFO: 38epoch:train:5371-5728batch: iter_time=0.005, forward_time=0.102, loss_ctc=45.311, loss_att=23.152, acc=0.777, loss=29.800, backward_time=0.055, grad_norm=183.678, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.331e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:41:31,083 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:41:48,995 (trainer:732) INFO: 38epoch:train:5729-6086batch: iter_time=0.005, forward_time=0.099, loss_ctc=44.108, loss_att=22.381, acc=0.779, loss=28.899, backward_time=0.053, grad_norm=179.028, clip=100.000, loss_scale=463.238, optim_step_time=0.033, optim0_lr0=3.329e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:43:21,017 (trainer:732) INFO: 38epoch:train:6087-6444batch: iter_time=0.005, forward_time=0.100, loss_ctc=44.546, loss_att=22.545, acc=0.777, loss=29.145, backward_time=0.054, grad_norm=177.931, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=3.327e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:44:53,545 (trainer:732) INFO: 38epoch:train:6445-6802batch: iter_time=0.005, forward_time=0.100, loss_ctc=44.939, loss_att=22.925, acc=0.777, loss=29.529, backward_time=0.053, grad_norm=181.313, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=3.324e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:46:26,813 (trainer:732) INFO: 38epoch:train:6803-7160batch: iter_time=0.007, forward_time=0.100, loss_ctc=45.136, loss_att=23.043, acc=0.779, loss=29.671, backward_time=0.053, grad_norm=184.465, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=3.322e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:47:36,165 (trainer:338) INFO: 38epoch results: [train] iter_time=0.004, forward_time=0.100, loss_ctc=45.279, loss_att=23.059, acc=0.775, loss=29.725, backward_time=0.054, grad_norm=182.055, clip=100.000, loss_scale=483.134, optim_step_time=0.033, optim0_lr0=3.343e-05, train_time=0.257, time=30 minutes and 42.52 seconds, total_count=272118, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=23.317, cer_ctc=0.132, loss_att=11.854, acc=0.886, cer=0.074, wer=0.798, loss=15.293, time=14.43 seconds, total_count=2014, gpu_max_cached_mem_GB=28.453, [att_plot] time=54.24 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:47:39,883 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:47:39,905 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/28epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:47:39,905 (trainer:272) INFO: 39/100epoch started. Estimated time to finish: 1 day, 9 hours and 15 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:49:12,417 (trainer:732) INFO: 39epoch:train:1-358batch: iter_time=0.003, forward_time=0.101, loss_ctc=44.308, loss_att=22.456, acc=0.779, loss=29.011, backward_time=0.053, grad_norm=179.510, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=3.320e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:50:43,115 (trainer:732) INFO: 39epoch:train:359-716batch: iter_time=0.001, forward_time=0.099, loss_ctc=44.295, loss_att=22.470, acc=0.780, loss=29.018, backward_time=0.053, grad_norm=175.761, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=3.318e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:52:13,985 (trainer:732) INFO: 39epoch:train:717-1074batch: iter_time=0.002, forward_time=0.099, loss_ctc=44.475, loss_att=22.667, acc=0.779, loss=29.209, backward_time=0.053, grad_norm=187.646, clip=100.000, loss_scale=411.173, optim_step_time=0.033, optim0_lr0=3.316e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:53:44,069 (trainer:732) INFO: 39epoch:train:1075-1432batch: iter_time=0.002, forward_time=0.098, loss_ctc=43.703, loss_att=22.153, acc=0.779, loss=28.618, backward_time=0.053, grad_norm=176.060, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.313e-05, train_time=0.251 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:55:15,432 (trainer:732) INFO: 39epoch:train:1433-1790batch: iter_time=0.004, forward_time=0.099, loss_ctc=44.770, loss_att=22.750, acc=0.780, loss=29.356, backward_time=0.053, grad_norm=186.532, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.311e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:56:46,560 (trainer:732) INFO: 39epoch:train:1791-2148batch: iter_time=0.004, forward_time=0.098, loss_ctc=42.996, loss_att=21.873, acc=0.779, loss=28.210, backward_time=0.054, grad_norm=183.369, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.309e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:58:18,294 (trainer:732) INFO: 39epoch:train:2149-2506batch: iter_time=0.004, forward_time=0.099, loss_ctc=45.049, loss_att=22.970, acc=0.779, loss=29.594, backward_time=0.054, grad_norm=183.842, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.307e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:59:29,971 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 13:59:51,087 (trainer:732) INFO: 39epoch:train:2507-2864batch: iter_time=0.004, forward_time=0.100, loss_ctc=46.493, loss_att=23.607, acc=0.778, loss=30.473, backward_time=0.053, grad_norm=187.561, clip=100.000, loss_scale=522.011, optim_step_time=0.033, optim0_lr0=3.305e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:00:41,553 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:01:21,693 (trainer:732) INFO: 39epoch:train:2865-3222batch: iter_time=0.002, forward_time=0.099, loss_ctc=44.711, loss_att=22.791, acc=0.780, loss=29.367, backward_time=0.053, grad_norm=177.987, clip=100.000, loss_scale=797.401, optim_step_time=0.033, optim0_lr0=3.303e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:02:53,379 (trainer:732) INFO: 39epoch:train:3223-3580batch: iter_time=0.003, forward_time=0.099, loss_ctc=45.820, loss_att=23.414, acc=0.778, loss=30.136, backward_time=0.054, grad_norm=193.882, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.300e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:04:25,520 (trainer:732) INFO: 39epoch:train:3581-3938batch: iter_time=0.005, forward_time=0.099, loss_ctc=45.411, loss_att=23.139, acc=0.778, loss=29.820, backward_time=0.054, grad_norm=183.498, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.298e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:05:51,228 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:05:59,491 (trainer:732) INFO: 39epoch:train:3939-4296batch: iter_time=0.009, forward_time=0.100, loss_ctc=44.616, loss_att=22.658, acc=0.780, loss=29.245, backward_time=0.053, grad_norm=181.718, clip=100.000, loss_scale=490.487, optim_step_time=0.033, optim0_lr0=3.296e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:06:08,130 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:07:30,863 (trainer:732) INFO: 39epoch:train:4297-4654batch: iter_time=0.008, forward_time=0.097, loss_ctc=42.608, loss_att=21.577, acc=0.780, loss=27.886, backward_time=0.053, grad_norm=180.064, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=3.294e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:09:02,635 (trainer:732) INFO: 39epoch:train:4655-5012batch: iter_time=0.004, forward_time=0.099, loss_ctc=45.403, loss_att=22.978, acc=0.782, loss=29.706, backward_time=0.054, grad_norm=187.721, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=3.292e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:10:34,511 (trainer:732) INFO: 39epoch:train:5013-5370batch: iter_time=0.007, forward_time=0.098, loss_ctc=43.069, loss_att=21.928, acc=0.781, loss=28.270, backward_time=0.054, grad_norm=181.966, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=3.290e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:12:07,736 (trainer:732) INFO: 39epoch:train:5371-5728batch: iter_time=0.006, forward_time=0.100, loss_ctc=44.179, loss_att=22.441, acc=0.782, loss=28.962, backward_time=0.053, grad_norm=181.767, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=3.288e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:13:41,392 (trainer:732) INFO: 39epoch:train:5729-6086batch: iter_time=0.008, forward_time=0.100, loss_ctc=45.207, loss_att=23.068, acc=0.779, loss=29.710, backward_time=0.053, grad_norm=186.902, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=3.286e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:15:14,830 (trainer:732) INFO: 39epoch:train:6087-6444batch: iter_time=0.008, forward_time=0.099, loss_ctc=44.483, loss_att=22.697, acc=0.779, loss=29.233, backward_time=0.054, grad_norm=185.737, clip=100.000, loss_scale=383.285, optim_step_time=0.033, optim0_lr0=3.283e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:16:48,290 (trainer:732) INFO: 39epoch:train:6445-6802batch: iter_time=0.007, forward_time=0.100, loss_ctc=45.803, loss_att=23.477, acc=0.776, loss=30.175, backward_time=0.054, grad_norm=187.146, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.281e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:18:14,325 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:18:22,434 (trainer:732) INFO: 39epoch:train:6803-7160batch: iter_time=0.010, forward_time=0.100, loss_ctc=44.762, loss_att=22.836, acc=0.776, loss=29.414, backward_time=0.053, grad_norm=183.756, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.279e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:19:31,275 (trainer:338) INFO: 39epoch results: [train] iter_time=0.005, forward_time=0.099, loss_ctc=44.596, loss_att=22.691, acc=0.779, loss=29.262, backward_time=0.053, grad_norm=183.622, clip=100.000, loss_scale=424.569, optim_step_time=0.033, optim0_lr0=3.299e-05, train_time=0.257, time=30 minutes and 43.22 seconds, total_count=279279, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=22.787, cer_ctc=0.127, loss_att=11.654, acc=0.887, cer=0.073, wer=0.799, loss=14.994, time=14.44 seconds, total_count=2067, gpu_max_cached_mem_GB=28.453, [att_plot] time=53.7 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:19:35,045 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:19:35,066 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/29epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:19:35,066 (trainer:272) INFO: 40/100epoch started. Estimated time to finish: 1 day, 8 hours and 42 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:21:06,869 (trainer:732) INFO: 40epoch:train:1-358batch: iter_time=0.004, forward_time=0.100, loss_ctc=43.127, loss_att=21.970, acc=0.782, loss=28.317, backward_time=0.053, grad_norm=185.632, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.277e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:22:29,972 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:22:37,581 (trainer:732) INFO: 40epoch:train:359-716batch: iter_time=0.002, forward_time=0.099, loss_ctc=42.839, loss_att=21.630, acc=0.786, loss=27.993, backward_time=0.053, grad_norm=177.330, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.275e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:24:08,257 (trainer:732) INFO: 40epoch:train:717-1074batch: iter_time=0.003, forward_time=0.098, loss_ctc=43.567, loss_att=22.167, acc=0.782, loss=28.587, backward_time=0.053, grad_norm=187.132, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.273e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:25:02,508 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:25:39,806 (trainer:732) INFO: 40epoch:train:1075-1432batch: iter_time=0.003, forward_time=0.100, loss_ctc=44.394, loss_att=22.628, acc=0.779, loss=29.158, backward_time=0.053, grad_norm=185.621, clip=100.000, loss_scale=770.151, optim_step_time=0.033, optim0_lr0=3.271e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:27:10,781 (trainer:732) INFO: 40epoch:train:1433-1790batch: iter_time=0.002, forward_time=0.099, loss_ctc=42.732, loss_att=21.706, acc=0.782, loss=28.014, backward_time=0.054, grad_norm=182.370, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.269e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:28:41,248 (trainer:732) INFO: 40epoch:train:1791-2148batch: iter_time=0.003, forward_time=0.099, loss_ctc=43.557, loss_att=22.134, acc=0.784, loss=28.561, backward_time=0.053, grad_norm=180.187, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.267e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:30:12,912 (trainer:732) INFO: 40epoch:train:2149-2506batch: iter_time=0.001, forward_time=0.102, loss_ctc=43.085, loss_att=21.947, acc=0.782, loss=28.288, backward_time=0.053, grad_norm=177.347, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.265e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:31:44,329 (trainer:732) INFO: 40epoch:train:2507-2864batch: iter_time=0.003, forward_time=0.100, loss_ctc=42.529, loss_att=21.574, acc=0.784, loss=27.860, backward_time=0.053, grad_norm=180.604, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.263e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:32:17,101 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:33:16,518 (trainer:732) INFO: 40epoch:train:2865-3222batch: iter_time=0.003, forward_time=0.101, loss_ctc=45.888, loss_att=23.374, acc=0.779, loss=30.128, backward_time=0.053, grad_norm=189.302, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.260e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:33:58,503 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:34:50,052 (trainer:732) INFO: 40epoch:train:3223-3580batch: iter_time=0.010, forward_time=0.100, loss_ctc=42.803, loss_att=21.717, acc=0.782, loss=28.043, backward_time=0.054, grad_norm=182.182, clip=100.000, loss_scale=651.115, optim_step_time=0.033, optim0_lr0=3.258e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:36:22,301 (trainer:732) INFO: 40epoch:train:3581-3938batch: iter_time=0.003, forward_time=0.101, loss_ctc=44.579, loss_att=22.665, acc=0.783, loss=29.239, backward_time=0.054, grad_norm=187.351, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.256e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:37:55,840 (trainer:732) INFO: 40epoch:train:3939-4296batch: iter_time=0.006, forward_time=0.101, loss_ctc=46.087, loss_att=23.404, acc=0.782, loss=30.208, backward_time=0.053, grad_norm=186.985, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.254e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:39:31,057 (trainer:732) INFO: 40epoch:train:4297-4654batch: iter_time=0.006, forward_time=0.103, loss_ctc=45.244, loss_att=23.016, acc=0.783, loss=29.684, backward_time=0.054, grad_norm=189.861, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.252e-05, train_time=0.266 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:41:04,687 (trainer:732) INFO: 40epoch:train:4655-5012batch: iter_time=0.008, forward_time=0.100, loss_ctc=44.572, loss_att=22.728, acc=0.781, loss=29.282, backward_time=0.053, grad_norm=190.321, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.250e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:42:37,338 (trainer:732) INFO: 40epoch:train:5013-5370batch: iter_time=0.004, forward_time=0.101, loss_ctc=42.689, loss_att=21.655, acc=0.786, loss=27.965, backward_time=0.054, grad_norm=176.539, clip=100.000, loss_scale=512.000, optim_step_time=0.034, optim0_lr0=3.248e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:43:41,124 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:44:10,264 (trainer:732) INFO: 40epoch:train:5371-5728batch: iter_time=0.004, forward_time=0.101, loss_ctc=44.445, loss_att=22.566, acc=0.784, loss=29.130, backward_time=0.055, grad_norm=188.733, clip=100.000, loss_scale=844.728, optim_step_time=0.033, optim0_lr0=3.246e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:45:42,136 (trainer:732) INFO: 40epoch:train:5729-6086batch: iter_time=0.004, forward_time=0.099, loss_ctc=43.495, loss_att=22.124, acc=0.783, loss=28.536, backward_time=0.055, grad_norm=183.249, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.244e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:47:15,352 (trainer:732) INFO: 40epoch:train:6087-6444batch: iter_time=0.004, forward_time=0.102, loss_ctc=45.976, loss_att=23.327, acc=0.781, loss=30.121, backward_time=0.053, grad_norm=184.096, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.242e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:48:36,320 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:48:48,258 (trainer:732) INFO: 40epoch:train:6445-6802batch: iter_time=0.005, forward_time=0.102, loss_ctc=44.298, loss_att=22.557, acc=0.785, loss=29.079, backward_time=0.053, grad_norm=185.056, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.240e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:50:19,792 (trainer:732) INFO: 40epoch:train:6803-7160batch: iter_time=0.006, forward_time=0.099, loss_ctc=44.107, loss_att=22.429, acc=0.783, loss=28.932, backward_time=0.053, grad_norm=183.090, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.238e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:51:28,783 (trainer:338) INFO: 40epoch results: [train] iter_time=0.004, forward_time=0.100, loss_ctc=43.985, loss_att=22.358, acc=0.783, loss=28.846, backward_time=0.053, grad_norm=184.142, clip=100.000, loss_scale=548.408, optim_step_time=0.033, optim0_lr0=3.257e-05, train_time=0.257, time=30 minutes and 45.41 seconds, total_count=286440, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=22.337, cer_ctc=0.125, loss_att=11.427, acc=0.889, cer=0.072, wer=0.787, loss=14.700, time=14.61 seconds, total_count=2120, gpu_max_cached_mem_GB=28.453, [att_plot] time=53.7 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:51:32,371 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:51:32,394 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/30epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:51:32,394 (trainer:272) INFO: 41/100epoch started. Estimated time to finish: 1 day, 8 hours and 9 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:53:04,348 (trainer:732) INFO: 41epoch:train:1-358batch: iter_time=0.003, forward_time=0.100, loss_ctc=43.281, loss_att=21.903, acc=0.786, loss=28.316, backward_time=0.053, grad_norm=176.605, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.236e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:53:32,271 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:53:39,319 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:54:36,467 (trainer:732) INFO: 41epoch:train:359-716batch: iter_time=0.006, forward_time=0.100, loss_ctc=42.120, loss_att=21.391, acc=0.786, loss=27.610, backward_time=0.053, grad_norm=177.527, clip=100.000, loss_scale=527.776, optim_step_time=0.033, optim0_lr0=3.234e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:56:08,724 (trainer:732) INFO: 41epoch:train:717-1074batch: iter_time=0.002, forward_time=0.101, loss_ctc=45.229, loss_att=22.951, acc=0.785, loss=29.634, backward_time=0.054, grad_norm=185.269, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.232e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:57:40,120 (trainer:732) INFO: 41epoch:train:1075-1432batch: iter_time=0.002, forward_time=0.101, loss_ctc=43.134, loss_att=21.811, acc=0.784, loss=28.208, backward_time=0.053, grad_norm=181.263, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.230e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 14:59:11,497 (trainer:732) INFO: 41epoch:train:1433-1790batch: iter_time=0.002, forward_time=0.100, loss_ctc=42.785, loss_att=21.657, acc=0.787, loss=27.996, backward_time=0.053, grad_norm=180.917, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.228e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:00:43,073 (trainer:732) INFO: 41epoch:train:1791-2148batch: iter_time=0.002, forward_time=0.100, loss_ctc=45.337, loss_att=23.011, acc=0.784, loss=29.709, backward_time=0.053, grad_norm=192.227, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.226e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:02:15,204 (trainer:732) INFO: 41epoch:train:2149-2506batch: iter_time=0.002, forward_time=0.101, loss_ctc=45.341, loss_att=22.975, acc=0.784, loss=29.685, backward_time=0.053, grad_norm=189.893, clip=100.000, loss_scale=566.346, optim_step_time=0.033, optim0_lr0=3.224e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:02:34,948 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:03:46,468 (trainer:732) INFO: 41epoch:train:2507-2864batch: iter_time=0.005, forward_time=0.099, loss_ctc=41.771, loss_att=21.151, acc=0.786, loss=27.337, backward_time=0.053, grad_norm=181.751, clip=100.000, loss_scale=620.997, optim_step_time=0.033, optim0_lr0=3.222e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:05:18,482 (trainer:732) INFO: 41epoch:train:2865-3222batch: iter_time=0.003, forward_time=0.100, loss_ctc=43.659, loss_att=22.071, acc=0.786, loss=28.548, backward_time=0.054, grad_norm=186.342, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.220e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:06:50,799 (trainer:732) INFO: 41epoch:train:3223-3580batch: iter_time=0.005, forward_time=0.100, loss_ctc=43.398, loss_att=21.930, acc=0.786, loss=28.370, backward_time=0.053, grad_norm=183.338, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.218e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:08:23,100 (trainer:732) INFO: 41epoch:train:3581-3938batch: iter_time=0.006, forward_time=0.099, loss_ctc=43.374, loss_att=22.009, acc=0.784, loss=28.418, backward_time=0.053, grad_norm=182.164, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.216e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:09:56,484 (trainer:732) INFO: 41epoch:train:3939-4296batch: iter_time=0.007, forward_time=0.100, loss_ctc=42.896, loss_att=21.719, acc=0.786, loss=28.072, backward_time=0.053, grad_norm=186.979, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.214e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:11:30,736 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:11:30,757 (trainer:732) INFO: 41epoch:train:4297-4654batch: iter_time=0.006, forward_time=0.105, loss_ctc=41.921, loss_att=21.175, acc=0.789, loss=27.399, backward_time=0.053, grad_norm=180.511, clip=100.000, loss_scale=612.392, optim_step_time=0.033, optim0_lr0=3.212e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:11:36,500 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:13:04,045 (trainer:732) INFO: 41epoch:train:4655-5012batch: iter_time=0.002, forward_time=0.105, loss_ctc=44.291, loss_att=22.566, acc=0.784, loss=29.083, backward_time=0.053, grad_norm=186.435, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.210e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:14:34,892 (trainer:732) INFO: 41epoch:train:5013-5370batch: iter_time=0.005, forward_time=0.098, loss_ctc=43.278, loss_att=21.874, acc=0.788, loss=28.295, backward_time=0.053, grad_norm=183.524, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.208e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:15:15,610 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:16:08,036 (trainer:732) INFO: 41epoch:train:5371-5728batch: iter_time=0.006, forward_time=0.101, loss_ctc=42.762, loss_att=21.745, acc=0.786, loss=28.050, backward_time=0.053, grad_norm=187.215, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.206e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:17:41,512 (trainer:732) INFO: 41epoch:train:5729-6086batch: iter_time=0.005, forward_time=0.102, loss_ctc=42.754, loss_att=21.759, acc=0.788, loss=28.057, backward_time=0.053, grad_norm=180.797, clip=100.000, loss_scale=512.000, optim_step_time=0.034, optim0_lr0=3.204e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:19:14,152 (trainer:732) INFO: 41epoch:train:6087-6444batch: iter_time=0.005, forward_time=0.100, loss_ctc=43.721, loss_att=22.226, acc=0.785, loss=28.674, backward_time=0.054, grad_norm=190.514, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.202e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:20:20,139 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:20:47,337 (trainer:732) INFO: 41epoch:train:6445-6802batch: iter_time=0.005, forward_time=0.101, loss_ctc=44.304, loss_att=22.530, acc=0.787, loss=29.062, backward_time=0.055, grad_norm=187.207, clip=100.000, loss_scale=572.235, optim_step_time=0.033, optim0_lr0=3.200e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:22:22,057 (trainer:732) INFO: 41epoch:train:6803-7160batch: iter_time=0.008, forward_time=0.101, loss_ctc=42.697, loss_att=21.693, acc=0.790, loss=27.994, backward_time=0.055, grad_norm=186.981, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.198e-05, train_time=0.264 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:23:31,065 (trainer:338) INFO: 41epoch results: [train] iter_time=0.004, forward_time=0.101, loss_ctc=43.382, loss_att=21.997, acc=0.786, loss=28.412, backward_time=0.053, grad_norm=184.360, clip=100.000, loss_scale=528.955, optim_step_time=0.033, optim0_lr0=3.217e-05, train_time=0.258, time=30 minutes and 50.34 seconds, total_count=293601, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=21.759, cer_ctc=0.121, loss_att=11.118, acc=0.893, cer=0.069, wer=0.786, loss=14.310, time=14.6 seconds, total_count=2173, gpu_max_cached_mem_GB=28.453, [att_plot] time=53.73 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:23:34,651 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:23:34,674 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/31epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:23:34,674 (trainer:272) INFO: 42/100epoch started. Estimated time to finish: 1 day, 7 hours and 37 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:23:37,220 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:25:07,184 (trainer:732) INFO: 42epoch:train:1-358batch: iter_time=0.004, forward_time=0.101, loss_ctc=43.324, loss_att=21.992, acc=0.788, loss=28.392, backward_time=0.053, grad_norm=182.137, clip=100.000, loss_scale=261.020, optim_step_time=0.033, optim0_lr0=3.196e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:26:39,538 (trainer:732) INFO: 42epoch:train:359-716batch: iter_time=0.001, forward_time=0.102, loss_ctc=43.532, loss_att=22.048, acc=0.789, loss=28.493, backward_time=0.053, grad_norm=180.986, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=3.194e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:28:10,430 (trainer:732) INFO: 42epoch:train:717-1074batch: iter_time=0.002, forward_time=0.100, loss_ctc=41.933, loss_att=21.154, acc=0.790, loss=27.388, backward_time=0.053, grad_norm=180.459, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=3.192e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:29:41,544 (trainer:732) INFO: 42epoch:train:1075-1432batch: iter_time=0.002, forward_time=0.099, loss_ctc=42.817, loss_att=21.719, acc=0.789, loss=28.048, backward_time=0.053, grad_norm=184.828, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=3.190e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:30:21,850 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:31:12,555 (trainer:732) INFO: 42epoch:train:1433-1790batch: iter_time=8.365e-04, forward_time=0.099, loss_ctc=44.681, loss_att=22.689, acc=0.788, loss=29.286, backward_time=0.054, grad_norm=184.575, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=3.189e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:32:03,076 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:32:44,748 (trainer:732) INFO: 42epoch:train:1791-2148batch: iter_time=0.004, forward_time=0.100, loss_ctc=41.946, loss_att=21.296, acc=0.789, loss=27.491, backward_time=0.053, grad_norm=179.524, clip=100.000, loss_scale=356.112, optim_step_time=0.033, optim0_lr0=3.187e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:34:16,572 (trainer:732) INFO: 42epoch:train:2149-2506batch: iter_time=0.003, forward_time=0.100, loss_ctc=41.589, loss_att=20.979, acc=0.792, loss=27.162, backward_time=0.053, grad_norm=181.673, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.185e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:35:48,615 (trainer:732) INFO: 42epoch:train:2507-2864batch: iter_time=0.003, forward_time=0.100, loss_ctc=44.454, loss_att=22.533, acc=0.789, loss=29.110, backward_time=0.053, grad_norm=188.570, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.183e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:37:20,334 (trainer:732) INFO: 42epoch:train:2865-3222batch: iter_time=0.002, forward_time=0.100, loss_ctc=43.652, loss_att=22.175, acc=0.787, loss=28.618, backward_time=0.053, grad_norm=182.167, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.181e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:38:53,053 (trainer:732) INFO: 42epoch:train:3223-3580batch: iter_time=0.006, forward_time=0.099, loss_ctc=41.818, loss_att=21.071, acc=0.791, loss=27.295, backward_time=0.054, grad_norm=179.666, clip=100.000, loss_scale=512.000, optim_step_time=0.034, optim0_lr0=3.179e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:40:25,164 (trainer:732) INFO: 42epoch:train:3581-3938batch: iter_time=0.004, forward_time=0.100, loss_ctc=42.459, loss_att=21.489, acc=0.789, loss=27.780, backward_time=0.053, grad_norm=183.402, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.177e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:40:45,414 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:41:56,563 (trainer:732) INFO: 42epoch:train:3939-4296batch: iter_time=0.005, forward_time=0.099, loss_ctc=42.864, loss_att=21.693, acc=0.790, loss=28.044, backward_time=0.054, grad_norm=183.843, clip=100.000, loss_scale=527.776, optim_step_time=0.033, optim0_lr0=3.175e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:43:28,632 (trainer:732) INFO: 42epoch:train:4297-4654batch: iter_time=0.006, forward_time=0.099, loss_ctc=42.416, loss_att=21.500, acc=0.789, loss=27.775, backward_time=0.054, grad_norm=183.849, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.173e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:45:01,260 (trainer:732) INFO: 42epoch:train:4655-5012batch: iter_time=0.007, forward_time=0.099, loss_ctc=42.426, loss_att=21.486, acc=0.790, loss=27.768, backward_time=0.053, grad_norm=182.806, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.171e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:46:34,541 (trainer:732) INFO: 42epoch:train:5013-5370batch: iter_time=0.004, forward_time=0.101, loss_ctc=43.308, loss_att=22.108, acc=0.791, loss=28.468, backward_time=0.053, grad_norm=190.006, clip=100.000, loss_scale=512.000, optim_step_time=0.034, optim0_lr0=3.169e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:48:07,237 (trainer:732) INFO: 42epoch:train:5371-5728batch: iter_time=0.004, forward_time=0.100, loss_ctc=43.023, loss_att=21.776, acc=0.789, loss=28.150, backward_time=0.054, grad_norm=184.708, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.167e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:49:38,473 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:49:38,853 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:49:41,722 (trainer:732) INFO: 42epoch:train:5729-6086batch: iter_time=0.012, forward_time=0.099, loss_ctc=42.196, loss_att=21.376, acc=0.789, loss=27.622, backward_time=0.053, grad_norm=182.778, clip=100.000, loss_scale=588.011, optim_step_time=0.033, optim0_lr0=3.166e-05, train_time=0.264 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:51:13,623 (trainer:732) INFO: 42epoch:train:6087-6444batch: iter_time=0.005, forward_time=0.099, loss_ctc=42.883, loss_att=21.699, acc=0.791, loss=28.054, backward_time=0.053, grad_norm=183.446, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.164e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:52:47,843 (trainer:732) INFO: 42epoch:train:6445-6802batch: iter_time=0.009, forward_time=0.100, loss_ctc=42.134, loss_att=21.291, acc=0.790, loss=27.544, backward_time=0.055, grad_norm=180.254, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.162e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:54:19,992 (trainer:732) INFO: 42epoch:train:6803-7160batch: iter_time=0.008, forward_time=0.098, loss_ctc=42.221, loss_att=21.401, acc=0.790, loss=27.647, backward_time=0.053, grad_norm=178.724, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.160e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:55:28,847 (trainer:338) INFO: 42epoch results: [train] iter_time=0.005, forward_time=0.100, loss_ctc=42.771, loss_att=21.667, acc=0.789, loss=27.998, backward_time=0.053, grad_norm=182.921, clip=100.000, loss_scale=445.049, optim_step_time=0.033, optim0_lr0=3.178e-05, train_time=0.257, time=30 minutes and 46.04 seconds, total_count=300762, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=21.527, cer_ctc=0.119, loss_att=11.020, acc=0.893, cer=0.068, wer=0.782, loss=14.172, time=14.45 seconds, total_count=2226, gpu_max_cached_mem_GB=28.453, [att_plot] time=53.68 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:55:32,642 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:55:32,665 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/32epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:55:32,665 (trainer:272) INFO: 43/100epoch started. Estimated time to finish: 1 day, 7 hours and 5 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:57:02,936 (trainer:732) INFO: 43epoch:train:1-358batch: iter_time=0.002, forward_time=0.098, loss_ctc=43.249, loss_att=21.864, acc=0.789, loss=28.280, backward_time=0.054, grad_norm=186.598, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.158e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:57:07,767 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:58:32,674 (trainer:732) INFO: 43epoch:train:359-716batch: iter_time=7.357e-04, forward_time=0.098, loss_ctc=42.506, loss_att=21.490, acc=0.793, loss=27.795, backward_time=0.054, grad_norm=186.455, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.156e-05, train_time=0.250 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 15:59:40,848 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:00:03,053 (trainer:732) INFO: 43epoch:train:717-1074batch: iter_time=0.002, forward_time=0.099, loss_ctc=41.594, loss_att=21.004, acc=0.794, loss=27.181, backward_time=0.053, grad_norm=181.321, clip=100.000, loss_scale=615.261, optim_step_time=0.033, optim0_lr0=3.154e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:01:34,821 (trainer:732) INFO: 43epoch:train:1075-1432batch: iter_time=0.003, forward_time=0.100, loss_ctc=41.424, loss_att=20.897, acc=0.796, loss=27.055, backward_time=0.053, grad_norm=186.489, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.152e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:03:06,365 (trainer:732) INFO: 43epoch:train:1433-1790batch: iter_time=0.002, forward_time=0.100, loss_ctc=42.249, loss_att=21.352, acc=0.794, loss=27.621, backward_time=0.053, grad_norm=185.298, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.151e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:03:23,605 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:04:38,201 (trainer:732) INFO: 43epoch:train:1791-2148batch: iter_time=0.004, forward_time=0.100, loss_ctc=40.074, loss_att=20.232, acc=0.794, loss=26.184, backward_time=0.053, grad_norm=178.823, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.149e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:04:52,711 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:06:09,966 (trainer:732) INFO: 43epoch:train:2149-2506batch: iter_time=0.003, forward_time=0.100, loss_ctc=42.265, loss_att=21.392, acc=0.791, loss=27.654, backward_time=0.053, grad_norm=186.535, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.147e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:07:43,314 (trainer:732) INFO: 43epoch:train:2507-2864batch: iter_time=0.004, forward_time=0.101, loss_ctc=43.811, loss_att=22.252, acc=0.787, loss=28.719, backward_time=0.054, grad_norm=186.972, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.145e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:08:22,915 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:09:15,644 (trainer:732) INFO: 43epoch:train:2865-3222batch: iter_time=0.004, forward_time=0.100, loss_ctc=44.492, loss_att=22.611, acc=0.789, loss=29.175, backward_time=0.053, grad_norm=187.406, clip=100.000, loss_scale=550.723, optim_step_time=0.033, optim0_lr0=3.143e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:10:48,050 (trainer:732) INFO: 43epoch:train:3223-3580batch: iter_time=0.006, forward_time=0.099, loss_ctc=41.731, loss_att=21.073, acc=0.791, loss=27.270, backward_time=0.053, grad_norm=181.394, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.141e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:12:20,211 (trainer:732) INFO: 43epoch:train:3581-3938batch: iter_time=0.003, forward_time=0.100, loss_ctc=42.760, loss_att=21.687, acc=0.789, loss=28.009, backward_time=0.054, grad_norm=186.026, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.139e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:13:52,434 (trainer:732) INFO: 43epoch:train:3939-4296batch: iter_time=0.005, forward_time=0.099, loss_ctc=43.354, loss_att=21.947, acc=0.790, loss=28.369, backward_time=0.054, grad_norm=183.274, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.138e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:15:25,469 (trainer:732) INFO: 43epoch:train:4297-4654batch: iter_time=0.008, forward_time=0.099, loss_ctc=41.717, loss_att=21.128, acc=0.794, loss=27.305, backward_time=0.053, grad_norm=176.554, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.136e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:16:59,314 (trainer:732) INFO: 43epoch:train:4655-5012batch: iter_time=0.005, forward_time=0.101, loss_ctc=44.520, loss_att=22.578, acc=0.789, loss=29.160, backward_time=0.054, grad_norm=185.586, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.134e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:17:05,299 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:18:30,950 (trainer:732) INFO: 43epoch:train:5013-5370batch: iter_time=0.005, forward_time=0.099, loss_ctc=41.255, loss_att=20.888, acc=0.794, loss=26.998, backward_time=0.053, grad_norm=181.390, clip=100.000, loss_scale=540.683, optim_step_time=0.033, optim0_lr0=3.132e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:20:03,685 (trainer:732) INFO: 43epoch:train:5371-5728batch: iter_time=0.004, forward_time=0.100, loss_ctc=43.658, loss_att=22.131, acc=0.790, loss=28.589, backward_time=0.054, grad_norm=185.990, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.130e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:21:35,152 (trainer:732) INFO: 43epoch:train:5729-6086batch: iter_time=0.006, forward_time=0.098, loss_ctc=40.823, loss_att=20.623, acc=0.796, loss=26.683, backward_time=0.054, grad_norm=187.852, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.128e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:23:10,441 (trainer:732) INFO: 43epoch:train:6087-6444batch: iter_time=0.007, forward_time=0.103, loss_ctc=41.399, loss_att=20.987, acc=0.792, loss=27.111, backward_time=0.053, grad_norm=184.771, clip=100.000, loss_scale=512.000, optim_step_time=0.035, optim0_lr0=3.127e-05, train_time=0.266 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:24:43,829 (trainer:732) INFO: 43epoch:train:6445-6802batch: iter_time=0.008, forward_time=0.100, loss_ctc=40.457, loss_att=20.444, acc=0.799, loss=26.448, backward_time=0.053, grad_norm=180.004, clip=100.000, loss_scale=512.000, optim_step_time=0.034, optim0_lr0=3.125e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:26:13,486 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:26:17,534 (trainer:732) INFO: 43epoch:train:6803-7160batch: iter_time=0.008, forward_time=0.100, loss_ctc=42.558, loss_att=21.563, acc=0.792, loss=27.862, backward_time=0.053, grad_norm=184.751, clip=100.000, loss_scale=666.891, optim_step_time=0.033, optim0_lr0=3.123e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:27:25,948 (trainer:338) INFO: 43epoch results: [train] iter_time=0.004, forward_time=0.100, loss_ctc=42.270, loss_att=21.394, acc=0.792, loss=27.657, backward_time=0.053, grad_norm=184.171, clip=100.000, loss_scale=528.239, optim_step_time=0.033, optim0_lr0=3.140e-05, train_time=0.257, time=30 minutes and 45.5 seconds, total_count=307923, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=21.072, cer_ctc=0.117, loss_att=10.759, acc=0.897, cer=0.066, wer=0.770, loss=13.853, time=14.47 seconds, total_count=2279, gpu_max_cached_mem_GB=28.453, [att_plot] time=53.31 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:27:29,762 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:27:29,784 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/33epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:27:29,785 (trainer:272) INFO: 44/100epoch started. Estimated time to finish: 1 day, 6 hours and 32 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:29:01,464 (trainer:732) INFO: 44epoch:train:1-358batch: iter_time=0.003, forward_time=0.100, loss_ctc=41.137, loss_att=20.767, acc=0.795, loss=26.878, backward_time=0.053, grad_norm=183.894, clip=100.000, loss_scale=512.000, optim_step_time=0.034, optim0_lr0=3.121e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:30:32,027 (trainer:732) INFO: 44epoch:train:359-716batch: iter_time=5.708e-04, forward_time=0.099, loss_ctc=42.683, loss_att=21.468, acc=0.796, loss=27.832, backward_time=0.053, grad_norm=182.409, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.119e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:31:41,538 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:32:02,774 (trainer:732) INFO: 44epoch:train:717-1074batch: iter_time=0.002, forward_time=0.099, loss_ctc=41.221, loss_att=20.770, acc=0.796, loss=26.905, backward_time=0.053, grad_norm=180.351, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.117e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:33:34,335 (trainer:732) INFO: 44epoch:train:1075-1432batch: iter_time=0.003, forward_time=0.100, loss_ctc=40.936, loss_att=20.658, acc=0.799, loss=26.742, backward_time=0.053, grad_norm=180.182, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.116e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:35:04,638 (trainer:732) INFO: 44epoch:train:1433-1790batch: iter_time=0.001, forward_time=0.098, loss_ctc=43.095, loss_att=21.821, acc=0.793, loss=28.203, backward_time=0.054, grad_norm=194.626, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.114e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:36:01,062 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:36:37,169 (trainer:732) INFO: 44epoch:train:1791-2148batch: iter_time=0.004, forward_time=0.100, loss_ctc=41.743, loss_att=21.079, acc=0.797, loss=27.278, backward_time=0.054, grad_norm=184.559, clip=100.000, loss_scale=547.854, optim_step_time=0.033, optim0_lr0=3.112e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:38:08,772 (trainer:732) INFO: 44epoch:train:2149-2506batch: iter_time=0.004, forward_time=0.099, loss_ctc=41.329, loss_att=20.879, acc=0.795, loss=27.014, backward_time=0.054, grad_norm=182.383, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.110e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:39:08,365 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:39:42,321 (trainer:732) INFO: 44epoch:train:2507-2864batch: iter_time=0.008, forward_time=0.099, loss_ctc=40.903, loss_att=20.757, acc=0.794, loss=26.801, backward_time=0.054, grad_norm=185.645, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.108e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:41:14,562 (trainer:732) INFO: 44epoch:train:2865-3222batch: iter_time=0.004, forward_time=0.100, loss_ctc=41.512, loss_att=21.031, acc=0.795, loss=27.176, backward_time=0.054, grad_norm=184.488, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.107e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:42:47,199 (trainer:732) INFO: 44epoch:train:3223-3580batch: iter_time=0.007, forward_time=0.099, loss_ctc=41.770, loss_att=21.099, acc=0.796, loss=27.301, backward_time=0.053, grad_norm=187.772, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.105e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:44:19,689 (trainer:732) INFO: 44epoch:train:3581-3938batch: iter_time=0.008, forward_time=0.099, loss_ctc=40.587, loss_att=20.446, acc=0.798, loss=26.488, backward_time=0.053, grad_norm=181.840, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.103e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:45:51,592 (trainer:732) INFO: 44epoch:train:3939-4296batch: iter_time=0.007, forward_time=0.098, loss_ctc=41.810, loss_att=21.298, acc=0.794, loss=27.451, backward_time=0.053, grad_norm=186.522, clip=100.000, loss_scale=922.458, optim_step_time=0.033, optim0_lr0=3.101e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:45:59,687 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:47:22,314 (trainer:732) INFO: 44epoch:train:4297-4654batch: iter_time=0.004, forward_time=0.098, loss_ctc=41.866, loss_att=21.209, acc=0.793, loss=27.406, backward_time=0.053, grad_norm=180.816, clip=100.000, loss_scale=555.025, optim_step_time=0.033, optim0_lr0=3.100e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:48:55,225 (trainer:732) INFO: 44epoch:train:4655-5012batch: iter_time=0.006, forward_time=0.100, loss_ctc=42.650, loss_att=21.649, acc=0.792, loss=27.950, backward_time=0.053, grad_norm=185.900, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.098e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:50:27,141 (trainer:732) INFO: 44epoch:train:5013-5370batch: iter_time=0.006, forward_time=0.098, loss_ctc=41.419, loss_att=20.978, acc=0.792, loss=27.110, backward_time=0.053, grad_norm=181.812, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.096e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:51:59,283 (trainer:732) INFO: 44epoch:train:5371-5728batch: iter_time=0.004, forward_time=0.099, loss_ctc=42.568, loss_att=21.548, acc=0.792, loss=27.854, backward_time=0.053, grad_norm=187.954, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.094e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:53:34,502 (trainer:732) INFO: 44epoch:train:5729-6086batch: iter_time=0.010, forward_time=0.101, loss_ctc=41.525, loss_att=21.048, acc=0.794, loss=27.191, backward_time=0.053, grad_norm=182.781, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.092e-05, train_time=0.266 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:55:08,299 (trainer:732) INFO: 44epoch:train:6087-6444batch: iter_time=0.009, forward_time=0.099, loss_ctc=41.998, loss_att=21.201, acc=0.794, loss=27.440, backward_time=0.055, grad_norm=184.544, clip=100.000, loss_scale=679.330, optim_step_time=0.033, optim0_lr0=3.091e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:55:17,362 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:56:40,782 (trainer:732) INFO: 44epoch:train:6445-6802batch: iter_time=0.004, forward_time=0.100, loss_ctc=42.749, loss_att=21.661, acc=0.796, loss=27.987, backward_time=0.054, grad_norm=187.939, clip=100.000, loss_scale=562.196, optim_step_time=0.033, optim0_lr0=3.089e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:56:47,670 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:58:13,750 (trainer:732) INFO: 44epoch:train:6803-7160batch: iter_time=0.007, forward_time=0.099, loss_ctc=41.872, loss_att=21.169, acc=0.795, loss=27.380, backward_time=0.053, grad_norm=181.378, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.087e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:59:22,197 (trainer:338) INFO: 44epoch results: [train] iter_time=0.005, forward_time=0.099, loss_ctc=41.755, loss_att=21.120, acc=0.795, loss=27.310, backward_time=0.053, grad_norm=184.387, clip=100.000, loss_scale=547.335, optim_step_time=0.033, optim0_lr0=3.104e-05, train_time=0.257, time=30 minutes and 44.59 seconds, total_count=315084, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=20.717, cer_ctc=0.115, loss_att=10.632, acc=0.898, cer=0.065, wer=0.767, loss=13.657, time=14.37 seconds, total_count=2332, gpu_max_cached_mem_GB=28.453, [att_plot] time=53.44 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:59:26,107 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:59:26,124 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/34epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 16:59:26,124 (trainer:272) INFO: 45/100epoch started. Estimated time to finish: 1 day, 6 hours and 15.53 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:00:03,518 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:00:57,929 (trainer:732) INFO: 45epoch:train:1-358batch: iter_time=0.003, forward_time=0.100, loss_ctc=40.418, loss_att=20.424, acc=0.796, loss=26.422, backward_time=0.054, grad_norm=182.104, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.085e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:01:03,770 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:02:28,232 (trainer:732) INFO: 45epoch:train:359-716batch: iter_time=0.001, forward_time=0.099, loss_ctc=40.718, loss_att=20.529, acc=0.799, loss=26.586, backward_time=0.054, grad_norm=182.449, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.084e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:03:59,285 (trainer:732) INFO: 45epoch:train:717-1074batch: iter_time=6.992e-04, forward_time=0.099, loss_ctc=42.360, loss_att=21.368, acc=0.798, loss=27.666, backward_time=0.055, grad_norm=180.760, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.082e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:05:28,230 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:05:30,654 (trainer:732) INFO: 45epoch:train:1075-1432batch: iter_time=0.002, forward_time=0.100, loss_ctc=40.802, loss_att=20.556, acc=0.800, loss=26.630, backward_time=0.054, grad_norm=179.188, clip=100.000, loss_scale=659.720, optim_step_time=0.033, optim0_lr0=3.080e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:07:03,633 (trainer:732) INFO: 45epoch:train:1433-1790batch: iter_time=0.002, forward_time=0.102, loss_ctc=42.534, loss_att=21.632, acc=0.795, loss=27.903, backward_time=0.053, grad_norm=184.224, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.078e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:07:45,259 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:08:33,925 (trainer:732) INFO: 45epoch:train:1791-2148batch: iter_time=0.001, forward_time=0.099, loss_ctc=40.500, loss_att=20.431, acc=0.800, loss=26.452, backward_time=0.053, grad_norm=187.245, clip=100.000, loss_scale=373.602, optim_step_time=0.033, optim0_lr0=3.077e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:10:07,651 (trainer:732) INFO: 45epoch:train:2149-2506batch: iter_time=0.004, forward_time=0.102, loss_ctc=42.197, loss_att=21.324, acc=0.795, loss=27.586, backward_time=0.053, grad_norm=185.594, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=3.075e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:11:39,647 (trainer:732) INFO: 45epoch:train:2507-2864batch: iter_time=0.003, forward_time=0.100, loss_ctc=41.780, loss_att=21.125, acc=0.797, loss=27.321, backward_time=0.053, grad_norm=188.561, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=3.073e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:11:43,068 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:13:09,957 (trainer:732) INFO: 45epoch:train:2865-3222batch: iter_time=0.002, forward_time=0.098, loss_ctc=42.790, loss_att=21.649, acc=0.796, loss=27.992, backward_time=0.053, grad_norm=185.714, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=3.072e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:14:41,614 (trainer:732) INFO: 45epoch:train:3223-3580batch: iter_time=0.008, forward_time=0.098, loss_ctc=38.633, loss_att=19.471, acc=0.802, loss=25.219, backward_time=0.053, grad_norm=184.940, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=3.070e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:16:13,458 (trainer:732) INFO: 45epoch:train:3581-3938batch: iter_time=0.004, forward_time=0.099, loss_ctc=42.237, loss_att=21.398, acc=0.795, loss=27.650, backward_time=0.053, grad_norm=189.112, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=3.068e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:17:45,221 (trainer:732) INFO: 45epoch:train:3939-4296batch: iter_time=0.005, forward_time=0.099, loss_ctc=40.829, loss_att=20.481, acc=0.802, loss=26.585, backward_time=0.054, grad_norm=185.157, clip=100.000, loss_scale=499.844, optim_step_time=0.033, optim0_lr0=3.066e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:19:17,588 (trainer:732) INFO: 45epoch:train:4297-4654batch: iter_time=0.004, forward_time=0.099, loss_ctc=41.276, loss_att=20.850, acc=0.798, loss=26.978, backward_time=0.054, grad_norm=189.673, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.065e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:20:48,445 (trainer:732) INFO: 45epoch:train:4655-5012batch: iter_time=0.002, forward_time=0.098, loss_ctc=43.331, loss_att=21.901, acc=0.793, loss=28.330, backward_time=0.054, grad_norm=190.081, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.063e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:22:19,841 (trainer:732) INFO: 45epoch:train:5013-5370batch: iter_time=0.004, forward_time=0.098, loss_ctc=41.444, loss_att=20.991, acc=0.796, loss=27.127, backward_time=0.054, grad_norm=185.928, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.061e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:23:51,209 (trainer:732) INFO: 45epoch:train:5371-5728batch: iter_time=0.006, forward_time=0.097, loss_ctc=40.567, loss_att=20.500, acc=0.798, loss=26.520, backward_time=0.055, grad_norm=185.597, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.059e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:25:26,470 (trainer:732) INFO: 45epoch:train:5729-6086batch: iter_time=0.010, forward_time=0.101, loss_ctc=42.163, loss_att=21.355, acc=0.797, loss=27.597, backward_time=0.054, grad_norm=186.669, clip=100.000, loss_scale=699.352, optim_step_time=0.033, optim0_lr0=3.058e-05, train_time=0.266 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:27:00,012 (trainer:732) INFO: 45epoch:train:6087-6444batch: iter_time=0.012, forward_time=0.098, loss_ctc=39.086, loss_att=19.779, acc=0.800, loss=25.571, backward_time=0.053, grad_norm=176.083, clip=100.000, loss_scale=1.024e+03, optim_step_time=0.033, optim0_lr0=3.056e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:28:31,570 (trainer:732) INFO: 45epoch:train:6445-6802batch: iter_time=0.007, forward_time=0.097, loss_ctc=40.220, loss_att=20.434, acc=0.799, loss=26.370, backward_time=0.054, grad_norm=178.002, clip=100.000, loss_scale=1.024e+03, optim_step_time=0.033, optim0_lr0=3.054e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:29:14,480 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:30:06,002 (trainer:732) INFO: 45epoch:train:6803-7160batch: iter_time=0.009, forward_time=0.100, loss_ctc=41.751, loss_att=21.173, acc=0.796, loss=27.346, backward_time=0.053, grad_norm=184.611, clip=100.000, loss_scale=747.204, optim_step_time=0.033, optim0_lr0=3.053e-05, train_time=0.264 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:31:14,872 (trainer:338) INFO: 45epoch results: [train] iter_time=0.004, forward_time=0.099, loss_ctc=41.252, loss_att=20.853, acc=0.798, loss=26.973, backward_time=0.054, grad_norm=184.573, clip=100.000, loss_scale=520.154, optim_step_time=0.033, optim0_lr0=3.069e-05, train_time=0.257, time=30 minutes and 40.56 seconds, total_count=322245, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=20.376, cer_ctc=0.112, loss_att=10.473, acc=0.898, cer=0.066, wer=0.762, loss=13.444, time=14.63 seconds, total_count=2385, gpu_max_cached_mem_GB=28.453, [att_plot] time=53.55 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:31:18,618 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:31:18,639 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/35epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:31:18,639 (trainer:272) INFO: 46/100epoch started. Estimated time to finish: 1 day, 5 hours and 27 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:31:54,233 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:32:50,199 (trainer:732) INFO: 46epoch:train:1-358batch: iter_time=0.003, forward_time=0.099, loss_ctc=39.920, loss_att=20.000, acc=0.801, loss=25.976, backward_time=0.055, grad_norm=181.507, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.051e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:33:04,016 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:34:21,152 (trainer:732) INFO: 46epoch:train:359-716batch: iter_time=0.001, forward_time=0.100, loss_ctc=41.984, loss_att=21.188, acc=0.796, loss=27.427, backward_time=0.053, grad_norm=191.380, clip=100.000, loss_scale=294.006, optim_step_time=0.033, optim0_lr0=3.049e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:35:52,218 (trainer:732) INFO: 46epoch:train:717-1074batch: iter_time=0.002, forward_time=0.099, loss_ctc=40.365, loss_att=20.259, acc=0.802, loss=26.291, backward_time=0.054, grad_norm=177.577, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=3.048e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:37:22,240 (trainer:732) INFO: 46epoch:train:1075-1432batch: iter_time=8.010e-04, forward_time=0.099, loss_ctc=42.256, loss_att=21.387, acc=0.797, loss=27.648, backward_time=0.053, grad_norm=185.417, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=3.046e-05, train_time=0.251 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:38:53,769 (trainer:732) INFO: 46epoch:train:1433-1790batch: iter_time=0.003, forward_time=0.099, loss_ctc=39.413, loss_att=19.804, acc=0.802, loss=25.687, backward_time=0.053, grad_norm=174.599, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=3.044e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:38:56,485 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:40:25,654 (trainer:732) INFO: 46epoch:train:1791-2148batch: iter_time=0.003, forward_time=0.100, loss_ctc=40.334, loss_att=20.404, acc=0.804, loss=26.383, backward_time=0.054, grad_norm=188.300, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=3.043e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:40:26,438 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:41:58,316 (trainer:732) INFO: 46epoch:train:2149-2506batch: iter_time=0.003, forward_time=0.101, loss_ctc=40.571, loss_att=20.487, acc=0.802, loss=26.512, backward_time=0.053, grad_norm=188.358, clip=100.000, loss_scale=323.218, optim_step_time=0.033, optim0_lr0=3.041e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:43:29,279 (trainer:732) INFO: 46epoch:train:2507-2864batch: iter_time=0.003, forward_time=0.098, loss_ctc=41.810, loss_att=21.061, acc=0.801, loss=27.286, backward_time=0.053, grad_norm=182.066, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.039e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:45:01,361 (trainer:732) INFO: 46epoch:train:2865-3222batch: iter_time=0.005, forward_time=0.099, loss_ctc=40.855, loss_att=20.557, acc=0.803, loss=26.646, backward_time=0.053, grad_norm=182.708, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.038e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:46:34,019 (trainer:732) INFO: 46epoch:train:3223-3580batch: iter_time=0.005, forward_time=0.100, loss_ctc=40.674, loss_att=20.648, acc=0.799, loss=26.656, backward_time=0.054, grad_norm=183.926, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.036e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:48:05,777 (trainer:732) INFO: 46epoch:train:3581-3938batch: iter_time=0.003, forward_time=0.100, loss_ctc=41.553, loss_att=20.978, acc=0.799, loss=27.151, backward_time=0.055, grad_norm=183.629, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.034e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:49:38,286 (trainer:732) INFO: 46epoch:train:3939-4296batch: iter_time=0.006, forward_time=0.099, loss_ctc=40.807, loss_att=20.704, acc=0.798, loss=26.735, backward_time=0.054, grad_norm=186.984, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.033e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:50:50,855 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:51:08,433 (trainer:732) INFO: 46epoch:train:4297-4654batch: iter_time=0.003, forward_time=0.097, loss_ctc=41.459, loss_att=20.882, acc=0.800, loss=27.055, backward_time=0.053, grad_norm=183.649, clip=100.000, loss_scale=755.810, optim_step_time=0.033, optim0_lr0=3.031e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:52:41,139 (trainer:732) INFO: 46epoch:train:4655-5012batch: iter_time=0.006, forward_time=0.099, loss_ctc=40.698, loss_att=20.547, acc=0.801, loss=26.592, backward_time=0.053, grad_norm=182.626, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.029e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:54:13,870 (trainer:732) INFO: 46epoch:train:5013-5370batch: iter_time=0.005, forward_time=0.100, loss_ctc=41.025, loss_att=20.756, acc=0.800, loss=26.837, backward_time=0.053, grad_norm=182.122, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.028e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:55:47,113 (trainer:732) INFO: 46epoch:train:5371-5728batch: iter_time=0.008, forward_time=0.099, loss_ctc=41.113, loss_att=20.718, acc=0.799, loss=26.836, backward_time=0.053, grad_norm=180.313, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.026e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:57:19,355 (trainer:732) INFO: 46epoch:train:5729-6086batch: iter_time=0.003, forward_time=0.100, loss_ctc=41.345, loss_att=20.929, acc=0.802, loss=27.054, backward_time=0.054, grad_norm=189.950, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.024e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:58:51,333 (trainer:732) INFO: 46epoch:train:6087-6444batch: iter_time=0.005, forward_time=0.099, loss_ctc=40.552, loss_att=20.483, acc=0.801, loss=26.504, backward_time=0.053, grad_norm=188.996, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.023e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 17:59:46,403 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:00:25,075 (trainer:732) INFO: 46epoch:train:6445-6802batch: iter_time=0.011, forward_time=0.099, loss_ctc=38.716, loss_att=19.535, acc=0.801, loss=25.289, backward_time=0.053, grad_norm=179.617, clip=100.000, loss_scale=608.090, optim_step_time=0.033, optim0_lr0=3.021e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:01:58,679 (trainer:732) INFO: 46epoch:train:6803-7160batch: iter_time=0.009, forward_time=0.099, loss_ctc=40.799, loss_att=20.628, acc=0.798, loss=26.679, backward_time=0.053, grad_norm=182.182, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.019e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:03:07,507 (trainer:338) INFO: 46epoch results: [train] iter_time=0.005, forward_time=0.099, loss_ctc=40.793, loss_att=20.588, acc=0.800, loss=26.649, backward_time=0.053, grad_norm=183.779, clip=100.000, loss_scale=457.424, optim_step_time=0.033, optim0_lr0=3.035e-05, train_time=0.257, time=30 minutes and 40.71 seconds, total_count=329406, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=20.029, cer_ctc=0.110, loss_att=10.273, acc=0.901, cer=0.064, wer=0.754, loss=13.200, time=14.38 seconds, total_count=2438, gpu_max_cached_mem_GB=28.453, [att_plot] time=53.77 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:03:11,385 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:03:11,407 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/36epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:03:11,408 (trainer:272) INFO: 47/100epoch started. Estimated time to finish: 1 day, 4 hours and 55 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:04:42,996 (trainer:732) INFO: 47epoch:train:1-358batch: iter_time=0.002, forward_time=0.101, loss_ctc=39.423, loss_att=19.800, acc=0.805, loss=25.687, backward_time=0.053, grad_norm=182.495, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.018e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:06:15,078 (trainer:732) INFO: 47epoch:train:359-716batch: iter_time=0.002, forward_time=0.100, loss_ctc=40.523, loss_att=20.480, acc=0.803, loss=26.493, backward_time=0.054, grad_norm=182.589, clip=100.000, loss_scale=512.000, optim_step_time=0.034, optim0_lr0=3.016e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:07:46,178 (trainer:732) INFO: 47epoch:train:717-1074batch: iter_time=0.002, forward_time=0.099, loss_ctc=40.829, loss_att=20.630, acc=0.801, loss=26.690, backward_time=0.054, grad_norm=188.069, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.014e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:08:45,694 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:09:18,013 (trainer:732) INFO: 47epoch:train:1075-1432batch: iter_time=0.002, forward_time=0.100, loss_ctc=41.039, loss_att=20.723, acc=0.801, loss=26.818, backward_time=0.054, grad_norm=185.446, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.013e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:09:41,408 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:10:50,675 (trainer:732) INFO: 47epoch:train:1433-1790batch: iter_time=0.003, forward_time=0.101, loss_ctc=39.227, loss_att=19.763, acc=0.804, loss=25.602, backward_time=0.054, grad_norm=182.563, clip=100.000, loss_scale=557.894, optim_step_time=0.033, optim0_lr0=3.011e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:12:21,166 (trainer:732) INFO: 47epoch:train:1791-2148batch: iter_time=9.062e-04, forward_time=0.099, loss_ctc=41.144, loss_att=20.694, acc=0.803, loss=26.829, backward_time=0.054, grad_norm=184.883, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.009e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:13:53,261 (trainer:732) INFO: 47epoch:train:2149-2506batch: iter_time=0.003, forward_time=0.100, loss_ctc=40.455, loss_att=20.376, acc=0.802, loss=26.400, backward_time=0.053, grad_norm=185.103, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.008e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:15:24,296 (trainer:732) INFO: 47epoch:train:2507-2864batch: iter_time=0.002, forward_time=0.100, loss_ctc=39.946, loss_att=20.109, acc=0.804, loss=26.060, backward_time=0.053, grad_norm=185.097, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.006e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:16:55,962 (trainer:732) INFO: 47epoch:train:2865-3222batch: iter_time=0.003, forward_time=0.099, loss_ctc=41.728, loss_att=21.062, acc=0.804, loss=27.262, backward_time=0.053, grad_norm=191.317, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.005e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:18:13,619 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:18:14,100 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:18:28,253 (trainer:732) INFO: 47epoch:train:3223-3580batch: iter_time=0.006, forward_time=0.099, loss_ctc=39.093, loss_att=19.663, acc=0.806, loss=25.492, backward_time=0.053, grad_norm=181.067, clip=100.000, loss_scale=513.434, optim_step_time=0.033, optim0_lr0=3.003e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:20:01,779 (trainer:732) INFO: 47epoch:train:3581-3938batch: iter_time=0.004, forward_time=0.102, loss_ctc=41.406, loss_att=20.916, acc=0.799, loss=27.063, backward_time=0.054, grad_norm=192.578, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.001e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:21:35,041 (trainer:732) INFO: 47epoch:train:3939-4296batch: iter_time=0.007, forward_time=0.100, loss_ctc=40.221, loss_att=20.329, acc=0.803, loss=26.296, backward_time=0.053, grad_norm=185.291, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=3.000e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:23:08,205 (trainer:732) INFO: 47epoch:train:4297-4654batch: iter_time=0.006, forward_time=0.100, loss_ctc=40.649, loss_att=20.470, acc=0.803, loss=26.524, backward_time=0.054, grad_norm=181.670, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.998e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:24:40,591 (trainer:732) INFO: 47epoch:train:4655-5012batch: iter_time=0.004, forward_time=0.100, loss_ctc=42.326, loss_att=21.343, acc=0.801, loss=27.638, backward_time=0.053, grad_norm=188.153, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.997e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:25:36,680 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:26:14,414 (trainer:732) INFO: 47epoch:train:5013-5370batch: iter_time=0.009, forward_time=0.100, loss_ctc=38.904, loss_att=19.614, acc=0.804, loss=25.401, backward_time=0.054, grad_norm=182.370, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.995e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:27:17,503 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:27:46,124 (trainer:732) INFO: 47epoch:train:5371-5728batch: iter_time=0.005, forward_time=0.099, loss_ctc=41.244, loss_att=20.738, acc=0.801, loss=26.889, backward_time=0.054, grad_norm=184.875, clip=100.000, loss_scale=639.641, optim_step_time=0.033, optim0_lr0=2.993e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:29:19,160 (trainer:732) INFO: 47epoch:train:5729-6086batch: iter_time=0.007, forward_time=0.100, loss_ctc=40.518, loss_att=20.507, acc=0.800, loss=26.510, backward_time=0.054, grad_norm=183.131, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.992e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:30:51,850 (trainer:732) INFO: 47epoch:train:6087-6444batch: iter_time=0.008, forward_time=0.099, loss_ctc=40.577, loss_att=20.491, acc=0.801, loss=26.516, backward_time=0.053, grad_norm=182.754, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.990e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:32:24,707 (trainer:732) INFO: 47epoch:train:6445-6802batch: iter_time=0.007, forward_time=0.100, loss_ctc=39.839, loss_att=20.219, acc=0.805, loss=26.105, backward_time=0.054, grad_norm=188.331, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.989e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:33:58,852 (trainer:732) INFO: 47epoch:train:6803-7160batch: iter_time=0.012, forward_time=0.099, loss_ctc=37.882, loss_att=19.027, acc=0.808, loss=24.684, backward_time=0.054, grad_norm=177.910, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.987e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:35:07,610 (trainer:338) INFO: 47epoch results: [train] iter_time=0.005, forward_time=0.100, loss_ctc=40.326, loss_att=20.336, acc=0.803, loss=26.333, backward_time=0.054, grad_norm=184.785, clip=100.000, loss_scale=520.726, optim_step_time=0.033, optim0_lr0=3.002e-05, train_time=0.258, time=30 minutes and 48.08 seconds, total_count=336567, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=19.790, cer_ctc=0.108, loss_att=10.161, acc=0.902, cer=0.063, wer=0.754, loss=13.050, time=14.56 seconds, total_count=2491, gpu_max_cached_mem_GB=28.453, [att_plot] time=53.56 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:35:11,226 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:35:11,241 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/37epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:35:11,241 (trainer:272) INFO: 48/100epoch started. Estimated time to finish: 1 day, 4 hours and 23 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:36:42,394 (trainer:732) INFO: 48epoch:train:1-358batch: iter_time=0.003, forward_time=0.099, loss_ctc=39.081, loss_att=19.631, acc=0.807, loss=25.466, backward_time=0.053, grad_norm=180.547, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.985e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:37:18,678 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:38:14,717 (trainer:732) INFO: 48epoch:train:359-716batch: iter_time=0.004, forward_time=0.101, loss_ctc=39.386, loss_att=19.844, acc=0.806, loss=25.706, backward_time=0.054, grad_norm=180.932, clip=100.000, loss_scale=573.669, optim_step_time=0.033, optim0_lr0=2.984e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:39:45,009 (trainer:732) INFO: 48epoch:train:717-1074batch: iter_time=0.002, forward_time=0.099, loss_ctc=39.752, loss_att=19.974, acc=0.805, loss=25.907, backward_time=0.053, grad_norm=183.687, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.982e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:41:16,011 (trainer:732) INFO: 48epoch:train:1075-1432batch: iter_time=0.002, forward_time=0.100, loss_ctc=40.527, loss_att=20.406, acc=0.804, loss=26.442, backward_time=0.053, grad_norm=186.179, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.981e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:41:45,951 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:42:46,879 (trainer:732) INFO: 48epoch:train:1433-1790batch: iter_time=9.633e-04, forward_time=0.100, loss_ctc=38.896, loss_att=19.532, acc=0.806, loss=25.341, backward_time=0.053, grad_norm=180.899, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.979e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:44:19,402 (trainer:732) INFO: 48epoch:train:1791-2148batch: iter_time=0.001, forward_time=0.102, loss_ctc=41.356, loss_att=20.921, acc=0.803, loss=27.052, backward_time=0.054, grad_norm=191.884, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.977e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:45:17,886 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:45:51,516 (trainer:732) INFO: 48epoch:train:2149-2506batch: iter_time=0.002, forward_time=0.101, loss_ctc=41.662, loss_att=20.969, acc=0.804, loss=27.177, backward_time=0.054, grad_norm=186.447, clip=100.000, loss_scale=523.441, optim_step_time=0.033, optim0_lr0=2.976e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:45:54,572 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:47:23,728 (trainer:732) INFO: 48epoch:train:2507-2864batch: iter_time=0.005, forward_time=0.099, loss_ctc=37.822, loss_att=19.036, acc=0.806, loss=24.672, backward_time=0.054, grad_norm=180.637, clip=100.000, loss_scale=527.776, optim_step_time=0.033, optim0_lr0=2.974e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:48:05,797 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:48:55,937 (trainer:732) INFO: 48epoch:train:2865-3222batch: iter_time=0.004, forward_time=0.100, loss_ctc=40.351, loss_att=20.388, acc=0.806, loss=26.377, backward_time=0.054, grad_norm=191.728, clip=100.000, loss_scale=372.168, optim_step_time=0.033, optim0_lr0=2.973e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:50:27,598 (trainer:732) INFO: 48epoch:train:3223-3580batch: iter_time=0.007, forward_time=0.097, loss_ctc=37.644, loss_att=18.957, acc=0.807, loss=24.563, backward_time=0.056, grad_norm=181.496, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.971e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:52:00,870 (trainer:732) INFO: 48epoch:train:3581-3938batch: iter_time=0.006, forward_time=0.100, loss_ctc=39.750, loss_att=20.004, acc=0.806, loss=25.928, backward_time=0.053, grad_norm=183.236, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.970e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:53:33,195 (trainer:732) INFO: 48epoch:train:3939-4296batch: iter_time=0.005, forward_time=0.099, loss_ctc=41.000, loss_att=20.731, acc=0.802, loss=26.812, backward_time=0.053, grad_norm=188.121, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.968e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:55:06,109 (trainer:732) INFO: 48epoch:train:4297-4654batch: iter_time=0.006, forward_time=0.100, loss_ctc=39.772, loss_att=20.082, acc=0.805, loss=25.989, backward_time=0.053, grad_norm=177.931, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.967e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:56:39,569 (trainer:732) INFO: 48epoch:train:4655-5012batch: iter_time=0.006, forward_time=0.100, loss_ctc=40.845, loss_att=20.578, acc=0.805, loss=26.658, backward_time=0.054, grad_norm=181.211, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.965e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:58:11,625 (trainer:732) INFO: 48epoch:train:5013-5370batch: iter_time=0.005, forward_time=0.099, loss_ctc=39.287, loss_att=19.729, acc=0.806, loss=25.596, backward_time=0.053, grad_norm=183.167, clip=100.000, loss_scale=501.274, optim_step_time=0.033, optim0_lr0=2.963e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 18:59:44,992 (trainer:732) INFO: 48epoch:train:5371-5728batch: iter_time=0.005, forward_time=0.101, loss_ctc=41.371, loss_att=20.876, acc=0.804, loss=27.024, backward_time=0.053, grad_norm=187.787, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.962e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:01:18,392 (trainer:732) INFO: 48epoch:train:5729-6086batch: iter_time=0.008, forward_time=0.100, loss_ctc=39.202, loss_att=19.709, acc=0.807, loss=25.557, backward_time=0.055, grad_norm=180.099, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.960e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:02:34,619 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:02:50,946 (trainer:732) INFO: 48epoch:train:6087-6444batch: iter_time=0.006, forward_time=0.098, loss_ctc=40.804, loss_att=20.636, acc=0.803, loss=26.686, backward_time=0.055, grad_norm=181.529, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.959e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:04:24,110 (trainer:732) INFO: 48epoch:train:6445-6802batch: iter_time=0.008, forward_time=0.099, loss_ctc=39.293, loss_att=19.768, acc=0.806, loss=25.625, backward_time=0.054, grad_norm=182.527, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.957e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:05:34,284 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:05:56,831 (trainer:732) INFO: 48epoch:train:6803-7160batch: iter_time=0.008, forward_time=0.098, loss_ctc=40.970, loss_att=20.706, acc=0.805, loss=26.785, backward_time=0.053, grad_norm=186.789, clip=100.000, loss_scale=576.538, optim_step_time=0.033, optim0_lr0=2.956e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:07:05,706 (trainer:338) INFO: 48epoch results: [train] iter_time=0.005, forward_time=0.100, loss_ctc=39.917, loss_att=20.112, acc=0.805, loss=26.054, backward_time=0.054, grad_norm=183.846, clip=100.000, loss_scale=448.116, optim_step_time=0.033, optim0_lr0=2.970e-05, train_time=0.258, time=30 minutes and 46.31 seconds, total_count=343728, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=19.357, cer_ctc=0.106, loss_att=9.983, acc=0.904, cer=0.062, wer=0.748, loss=12.795, time=14.56 seconds, total_count=2544, gpu_max_cached_mem_GB=28.453, [att_plot] time=53.59 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:07:09,521 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:07:09,541 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/38epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:07:09,541 (trainer:272) INFO: 49/100epoch started. Estimated time to finish: 1 day, 3 hours and 50 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:08:39,972 (trainer:732) INFO: 49epoch:train:1-358batch: iter_time=0.003, forward_time=0.098, loss_ctc=39.472, loss_att=19.813, acc=0.809, loss=25.711, backward_time=0.053, grad_norm=183.178, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.954e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:10:11,011 (trainer:732) INFO: 49epoch:train:359-716batch: iter_time=0.003, forward_time=0.099, loss_ctc=41.330, loss_att=20.872, acc=0.803, loss=27.009, backward_time=0.052, grad_norm=190.824, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.953e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:11:42,035 (trainer:732) INFO: 49epoch:train:717-1074batch: iter_time=0.002, forward_time=0.099, loss_ctc=38.383, loss_att=19.256, acc=0.810, loss=24.994, backward_time=0.054, grad_norm=179.711, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.951e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:13:12,044 (trainer:732) INFO: 49epoch:train:1075-1432batch: iter_time=0.002, forward_time=0.098, loss_ctc=38.465, loss_att=19.316, acc=0.811, loss=25.061, backward_time=0.053, grad_norm=180.953, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.950e-05, train_time=0.251 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:14:44,234 (trainer:732) INFO: 49epoch:train:1433-1790batch: iter_time=0.002, forward_time=0.100, loss_ctc=42.101, loss_att=21.215, acc=0.803, loss=27.481, backward_time=0.053, grad_norm=191.121, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.948e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:15:30,453 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:16:15,202 (trainer:732) INFO: 49epoch:train:1791-2148batch: iter_time=0.003, forward_time=0.099, loss_ctc=39.024, loss_att=19.676, acc=0.806, loss=25.480, backward_time=0.053, grad_norm=185.552, clip=100.000, loss_scale=592.314, optim_step_time=0.033, optim0_lr0=2.947e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:17:46,840 (trainer:732) INFO: 49epoch:train:2149-2506batch: iter_time=0.005, forward_time=0.099, loss_ctc=38.425, loss_att=19.236, acc=0.809, loss=24.993, backward_time=0.054, grad_norm=179.641, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.945e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:19:18,795 (trainer:732) INFO: 49epoch:train:2507-2864batch: iter_time=0.006, forward_time=0.099, loss_ctc=38.327, loss_att=19.298, acc=0.810, loss=25.007, backward_time=0.053, grad_norm=183.237, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.943e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:20:51,336 (trainer:732) INFO: 49epoch:train:2865-3222batch: iter_time=0.006, forward_time=0.099, loss_ctc=38.018, loss_att=19.210, acc=0.807, loss=24.853, backward_time=0.054, grad_norm=181.323, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.942e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:22:23,369 (trainer:732) INFO: 49epoch:train:3223-3580batch: iter_time=0.004, forward_time=0.099, loss_ctc=38.786, loss_att=19.501, acc=0.808, loss=25.286, backward_time=0.056, grad_norm=183.086, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.940e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:23:55,762 (trainer:732) INFO: 49epoch:train:3581-3938batch: iter_time=0.006, forward_time=0.099, loss_ctc=39.271, loss_att=19.711, acc=0.808, loss=25.579, backward_time=0.054, grad_norm=184.903, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.939e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:24:05,092 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:24:31,249 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:24:58,298 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:25:29,273 (trainer:732) INFO: 49epoch:train:3939-4296batch: iter_time=0.007, forward_time=0.100, loss_ctc=40.441, loss_att=20.370, acc=0.806, loss=26.391, backward_time=0.053, grad_norm=186.835, clip=100.000, loss_scale=514.868, optim_step_time=0.033, optim0_lr0=2.937e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:27:02,410 (trainer:732) INFO: 49epoch:train:4297-4654batch: iter_time=0.004, forward_time=0.101, loss_ctc=41.263, loss_att=20.872, acc=0.804, loss=26.989, backward_time=0.055, grad_norm=187.656, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.936e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:27:34,866 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:28:35,134 (trainer:732) INFO: 49epoch:train:4655-5012batch: iter_time=0.007, forward_time=0.099, loss_ctc=39.154, loss_att=19.734, acc=0.807, loss=25.560, backward_time=0.054, grad_norm=181.582, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.934e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:30:08,447 (trainer:732) INFO: 49epoch:train:5013-5370batch: iter_time=0.007, forward_time=0.100, loss_ctc=39.262, loss_att=19.761, acc=0.809, loss=25.611, backward_time=0.054, grad_norm=184.162, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.933e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:31:41,316 (trainer:732) INFO: 49epoch:train:5371-5728batch: iter_time=0.008, forward_time=0.098, loss_ctc=39.198, loss_att=19.662, acc=0.809, loss=25.523, backward_time=0.054, grad_norm=181.943, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.931e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:33:10,677 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:33:14,144 (trainer:732) INFO: 49epoch:train:5729-6086batch: iter_time=0.007, forward_time=0.099, loss_ctc=39.539, loss_att=19.950, acc=0.806, loss=25.827, backward_time=0.054, grad_norm=186.617, clip=100.000, loss_scale=653.983, optim_step_time=0.033, optim0_lr0=2.930e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:34:47,080 (trainer:732) INFO: 49epoch:train:6087-6444batch: iter_time=0.006, forward_time=0.100, loss_ctc=40.648, loss_att=20.588, acc=0.808, loss=26.606, backward_time=0.053, grad_norm=187.729, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.928e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:36:20,606 (trainer:732) INFO: 49epoch:train:6445-6802batch: iter_time=0.007, forward_time=0.100, loss_ctc=40.795, loss_att=20.470, acc=0.804, loss=26.568, backward_time=0.053, grad_norm=186.991, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.927e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:37:52,895 (trainer:732) INFO: 49epoch:train:6803-7160batch: iter_time=0.007, forward_time=0.099, loss_ctc=38.981, loss_att=19.570, acc=0.808, loss=25.393, backward_time=0.053, grad_norm=185.439, clip=100.000, loss_scale=512.000, optim_step_time=0.034, optim0_lr0=2.925e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:39:01,625 (trainer:338) INFO: 49epoch results: [train] iter_time=0.005, forward_time=0.099, loss_ctc=39.528, loss_att=19.896, acc=0.807, loss=25.785, backward_time=0.054, grad_norm=184.636, clip=100.000, loss_scale=523.230, optim_step_time=0.033, optim0_lr0=2.940e-05, train_time=0.257, time=30 minutes and 44.06 seconds, total_count=350889, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=19.241, cer_ctc=0.104, loss_att=9.907, acc=0.904, cer=0.062, wer=0.748, loss=12.707, time=14.26 seconds, total_count=2597, gpu_max_cached_mem_GB=28.453, [att_plot] time=53.75 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:39:05,426 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:39:05,451 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/39epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:39:05,451 (trainer:272) INFO: 50/100epoch started. Estimated time to finish: 1 day, 3 hours and 18 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:40:37,814 (trainer:732) INFO: 50epoch:train:1-358batch: iter_time=0.007, forward_time=0.099, loss_ctc=38.870, loss_att=19.575, acc=0.809, loss=25.364, backward_time=0.053, grad_norm=184.457, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.924e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:42:08,703 (trainer:732) INFO: 50epoch:train:359-716batch: iter_time=0.001, forward_time=0.100, loss_ctc=38.395, loss_att=19.225, acc=0.810, loss=24.976, backward_time=0.053, grad_norm=183.625, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.922e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:43:13,265 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:43:39,009 (trainer:732) INFO: 50epoch:train:717-1074batch: iter_time=7.435e-04, forward_time=0.099, loss_ctc=39.118, loss_att=19.681, acc=0.809, loss=25.512, backward_time=0.054, grad_norm=183.516, clip=100.000, loss_scale=598.050, optim_step_time=0.033, optim0_lr0=2.921e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:44:24,849 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:45:10,334 (trainer:732) INFO: 50epoch:train:1075-1432batch: iter_time=0.002, forward_time=0.099, loss_ctc=41.106, loss_att=20.694, acc=0.804, loss=26.818, backward_time=0.053, grad_norm=189.636, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.919e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:45:45,451 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:46:40,222 (trainer:732) INFO: 50epoch:train:1433-1790batch: iter_time=0.002, forward_time=0.098, loss_ctc=37.631, loss_att=18.920, acc=0.810, loss=24.533, backward_time=0.053, grad_norm=182.911, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.918e-05, train_time=0.251 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:48:12,129 (trainer:732) INFO: 50epoch:train:1791-2148batch: iter_time=0.003, forward_time=0.100, loss_ctc=38.939, loss_att=19.598, acc=0.809, loss=25.400, backward_time=0.053, grad_norm=187.333, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.916e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:49:35,550 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:49:41,449 (trainer:732) INFO: 50epoch:train:2149-2506batch: iter_time=0.002, forward_time=0.097, loss_ctc=38.620, loss_att=19.379, acc=0.810, loss=25.151, backward_time=0.053, grad_norm=174.512, clip=100.000, loss_scale=495.507, optim_step_time=0.033, optim0_lr0=2.915e-05, train_time=0.249 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:51:12,080 (trainer:732) INFO: 50epoch:train:2507-2864batch: iter_time=0.003, forward_time=0.098, loss_ctc=38.269, loss_att=19.150, acc=0.811, loss=24.886, backward_time=0.054, grad_norm=179.329, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.913e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:52:43,897 (trainer:732) INFO: 50epoch:train:2865-3222batch: iter_time=0.004, forward_time=0.099, loss_ctc=39.259, loss_att=19.748, acc=0.809, loss=25.601, backward_time=0.053, grad_norm=187.743, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.912e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:54:16,582 (trainer:732) INFO: 50epoch:train:3223-3580batch: iter_time=0.006, forward_time=0.099, loss_ctc=39.247, loss_att=19.789, acc=0.808, loss=25.626, backward_time=0.053, grad_norm=187.064, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.911e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:55:49,304 (trainer:732) INFO: 50epoch:train:3581-3938batch: iter_time=0.007, forward_time=0.099, loss_ctc=38.342, loss_att=19.292, acc=0.811, loss=25.007, backward_time=0.053, grad_norm=188.331, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.909e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:57:21,422 (trainer:732) INFO: 50epoch:train:3939-4296batch: iter_time=0.003, forward_time=0.100, loss_ctc=41.085, loss_att=20.711, acc=0.806, loss=26.823, backward_time=0.054, grad_norm=188.390, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.908e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 19:58:53,705 (trainer:732) INFO: 50epoch:train:4297-4654batch: iter_time=0.004, forward_time=0.100, loss_ctc=41.145, loss_att=20.710, acc=0.809, loss=26.841, backward_time=0.053, grad_norm=191.085, clip=100.000, loss_scale=378.279, optim_step_time=0.033, optim0_lr0=2.906e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:00:26,791 (trainer:732) INFO: 50epoch:train:4655-5012batch: iter_time=0.005, forward_time=0.100, loss_ctc=40.019, loss_att=20.167, acc=0.808, loss=26.122, backward_time=0.053, grad_norm=186.200, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.905e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:01:59,742 (trainer:732) INFO: 50epoch:train:5013-5370batch: iter_time=0.008, forward_time=0.099, loss_ctc=38.136, loss_att=19.032, acc=0.812, loss=24.764, backward_time=0.053, grad_norm=180.123, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.903e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:03:33,402 (trainer:732) INFO: 50epoch:train:5371-5728batch: iter_time=0.009, forward_time=0.100, loss_ctc=38.806, loss_att=19.529, acc=0.808, loss=25.312, backward_time=0.053, grad_norm=190.217, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.902e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:05:05,039 (trainer:732) INFO: 50epoch:train:5729-6086batch: iter_time=0.006, forward_time=0.098, loss_ctc=39.078, loss_att=19.663, acc=0.812, loss=25.487, backward_time=0.053, grad_norm=185.683, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.900e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:06:35,709 (trainer:732) INFO: 50epoch:train:6087-6444batch: iter_time=0.003, forward_time=0.098, loss_ctc=40.763, loss_att=20.584, acc=0.806, loss=26.638, backward_time=0.053, grad_norm=190.532, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.899e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:06:40,058 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:08:10,267 (trainer:732) INFO: 50epoch:train:6445-6802batch: iter_time=0.011, forward_time=0.099, loss_ctc=37.713, loss_att=18.971, acc=0.814, loss=24.593, backward_time=0.054, grad_norm=179.341, clip=100.000, loss_scale=968.223, optim_step_time=0.033, optim0_lr0=2.897e-05, train_time=0.264 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:08:17,654 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:09:43,805 (trainer:732) INFO: 50epoch:train:6803-7160batch: iter_time=0.007, forward_time=0.100, loss_ctc=39.815, loss_att=19.979, acc=0.810, loss=25.929, backward_time=0.053, grad_norm=188.800, clip=100.000, loss_scale=552.157, optim_step_time=0.033, optim0_lr0=2.896e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:10:52,360 (trainer:338) INFO: 50epoch results: [train] iter_time=0.005, forward_time=0.099, loss_ctc=39.191, loss_att=19.706, acc=0.809, loss=25.551, backward_time=0.053, grad_norm=185.435, clip=100.000, loss_scale=469.584, optim_step_time=0.033, optim0_lr0=2.910e-05, train_time=0.256, time=30 minutes and 38.99 seconds, total_count=358050, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=18.956, cer_ctc=0.103, loss_att=9.812, acc=0.906, cer=0.061, wer=0.740, loss=12.555, time=14.4 seconds, total_count=2650, gpu_max_cached_mem_GB=28.453, [att_plot] time=53.51 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:10:56,118 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:10:56,140 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/40epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:10:56,140 (trainer:272) INFO: 51/100epoch started. Estimated time to finish: 1 day, 2 hours and 45 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:12:26,888 (trainer:732) INFO: 51epoch:train:1-358batch: iter_time=0.003, forward_time=0.098, loss_ctc=39.009, loss_att=19.530, acc=0.812, loss=25.374, backward_time=0.053, grad_norm=184.317, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.895e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:13:59,177 (trainer:732) INFO: 51epoch:train:359-716batch: iter_time=0.003, forward_time=0.101, loss_ctc=37.844, loss_att=18.940, acc=0.814, loss=24.611, backward_time=0.053, grad_norm=181.895, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.893e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:15:30,497 (trainer:732) INFO: 51epoch:train:717-1074batch: iter_time=0.003, forward_time=0.099, loss_ctc=37.668, loss_att=18.889, acc=0.810, loss=24.523, backward_time=0.053, grad_norm=185.348, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.892e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:16:39,624 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:17:01,939 (trainer:732) INFO: 51epoch:train:1075-1432batch: iter_time=0.002, forward_time=0.100, loss_ctc=39.306, loss_att=19.750, acc=0.810, loss=25.617, backward_time=0.053, grad_norm=188.026, clip=100.000, loss_scale=451.048, optim_step_time=0.033, optim0_lr0=2.890e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:18:32,808 (trainer:732) INFO: 51epoch:train:1433-1790batch: iter_time=0.001, forward_time=0.099, loss_ctc=39.084, loss_att=19.640, acc=0.813, loss=25.473, backward_time=0.053, grad_norm=189.511, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.889e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:20:03,697 (trainer:732) INFO: 51epoch:train:1791-2148batch: iter_time=0.002, forward_time=0.099, loss_ctc=38.904, loss_att=19.570, acc=0.812, loss=25.370, backward_time=0.053, grad_norm=192.446, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.887e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:21:34,613 (trainer:732) INFO: 51epoch:train:2149-2506batch: iter_time=7.947e-04, forward_time=0.099, loss_ctc=41.198, loss_att=20.733, acc=0.808, loss=26.872, backward_time=0.054, grad_norm=182.050, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.886e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:23:05,080 (trainer:732) INFO: 51epoch:train:2507-2864batch: iter_time=0.002, forward_time=0.098, loss_ctc=38.179, loss_att=19.144, acc=0.813, loss=24.855, backward_time=0.053, grad_norm=184.526, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.884e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:24:37,052 (trainer:732) INFO: 51epoch:train:2865-3222batch: iter_time=0.004, forward_time=0.100, loss_ctc=39.061, loss_att=19.622, acc=0.810, loss=25.454, backward_time=0.053, grad_norm=184.938, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.883e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:26:08,965 (trainer:732) INFO: 51epoch:train:3223-3580batch: iter_time=0.005, forward_time=0.099, loss_ctc=38.928, loss_att=19.551, acc=0.810, loss=25.364, backward_time=0.053, grad_norm=181.271, clip=100.000, loss_scale=422.615, optim_step_time=0.033, optim0_lr0=2.882e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:27:41,295 (trainer:732) INFO: 51epoch:train:3581-3938batch: iter_time=0.007, forward_time=0.099, loss_ctc=37.979, loss_att=19.044, acc=0.814, loss=24.725, backward_time=0.054, grad_norm=182.515, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.880e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:29:13,280 (trainer:732) INFO: 51epoch:train:3939-4296batch: iter_time=0.005, forward_time=0.099, loss_ctc=38.281, loss_att=19.291, acc=0.812, loss=24.988, backward_time=0.054, grad_norm=182.525, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.879e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:30:46,189 (trainer:732) INFO: 51epoch:train:4297-4654batch: iter_time=0.008, forward_time=0.098, loss_ctc=38.683, loss_att=19.401, acc=0.812, loss=25.185, backward_time=0.055, grad_norm=181.278, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.877e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:32:18,263 (trainer:732) INFO: 51epoch:train:4655-5012batch: iter_time=0.005, forward_time=0.099, loss_ctc=39.016, loss_att=19.594, acc=0.811, loss=25.421, backward_time=0.054, grad_norm=184.676, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.876e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:33:50,799 (trainer:732) INFO: 51epoch:train:5013-5370batch: iter_time=0.007, forward_time=0.098, loss_ctc=37.531, loss_att=18.840, acc=0.811, loss=24.447, backward_time=0.055, grad_norm=183.183, clip=100.000, loss_scale=544.894, optim_step_time=0.033, optim0_lr0=2.874e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:34:16,936 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:34:33,375 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:35:22,817 (trainer:732) INFO: 51epoch:train:5371-5728batch: iter_time=0.006, forward_time=0.099, loss_ctc=38.900, loss_att=19.513, acc=0.812, loss=25.329, backward_time=0.053, grad_norm=181.634, clip=100.000, loss_scale=656.852, optim_step_time=0.033, optim0_lr0=2.873e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:36:54,696 (trainer:732) INFO: 51epoch:train:5729-6086batch: iter_time=0.004, forward_time=0.099, loss_ctc=39.733, loss_att=20.024, acc=0.811, loss=25.936, backward_time=0.053, grad_norm=190.262, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.872e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:38:25,027 (trainer:732) INFO: 51epoch:train:6087-6444batch: iter_time=0.005, forward_time=0.097, loss_ctc=38.086, loss_att=19.099, acc=0.812, loss=24.795, backward_time=0.054, grad_norm=183.350, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.870e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:39:21,261 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:39:57,878 (trainer:732) INFO: 51epoch:train:6445-6802batch: iter_time=0.005, forward_time=0.100, loss_ctc=41.070, loss_att=20.748, acc=0.807, loss=26.845, backward_time=0.054, grad_norm=185.646, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.869e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:41:17,384 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:41:33,472 (trainer:732) INFO: 51epoch:train:6803-7160batch: iter_time=0.012, forward_time=0.100, loss_ctc=37.551, loss_att=18.830, acc=0.814, loss=24.446, backward_time=0.053, grad_norm=183.508, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.867e-05, train_time=0.267 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:42:42,292 (trainer:338) INFO: 51epoch results: [train] iter_time=0.005, forward_time=0.099, loss_ctc=38.785, loss_att=19.479, acc=0.811, loss=25.271, backward_time=0.053, grad_norm=184.656, clip=100.000, loss_scale=449.350, optim_step_time=0.033, optim0_lr0=2.881e-05, train_time=0.256, time=30 minutes and 38 seconds, total_count=365211, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=18.737, cer_ctc=0.102, loss_att=9.648, acc=0.906, cer=0.060, wer=0.738, loss=12.375, time=14.42 seconds, total_count=2703, gpu_max_cached_mem_GB=28.453, [att_plot] time=53.73 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:42:45,766 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:42:45,782 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/41epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:42:45,782 (trainer:272) INFO: 52/100epoch started. Estimated time to finish: 1 day, 2 hours and 13 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:44:16,290 (trainer:732) INFO: 52epoch:train:1-358batch: iter_time=0.003, forward_time=0.098, loss_ctc=37.368, loss_att=18.709, acc=0.816, loss=24.307, backward_time=0.054, grad_norm=181.102, clip=100.000, loss_scale=579.218, optim_step_time=0.033, optim0_lr0=2.866e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:44:24,767 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:44:52,048 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:45:47,573 (trainer:732) INFO: 52epoch:train:359-716batch: iter_time=0.004, forward_time=0.099, loss_ctc=36.665, loss_att=18.415, acc=0.816, loss=23.890, backward_time=0.053, grad_norm=183.224, clip=100.000, loss_scale=401.258, optim_step_time=0.033, optim0_lr0=2.865e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:47:18,561 (trainer:732) INFO: 52epoch:train:717-1074batch: iter_time=0.002, forward_time=0.099, loss_ctc=39.255, loss_att=19.730, acc=0.813, loss=25.588, backward_time=0.053, grad_norm=189.142, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.863e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:48:48,955 (trainer:732) INFO: 52epoch:train:1075-1432batch: iter_time=9.739e-04, forward_time=0.099, loss_ctc=40.663, loss_att=20.408, acc=0.810, loss=26.485, backward_time=0.053, grad_norm=194.378, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.862e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:50:19,862 (trainer:732) INFO: 52epoch:train:1433-1790batch: iter_time=0.001, forward_time=0.100, loss_ctc=38.543, loss_att=19.337, acc=0.814, loss=25.099, backward_time=0.054, grad_norm=187.235, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.860e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:51:51,222 (trainer:732) INFO: 52epoch:train:1791-2148batch: iter_time=0.001, forward_time=0.100, loss_ctc=39.753, loss_att=19.848, acc=0.810, loss=25.820, backward_time=0.055, grad_norm=187.139, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.859e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:53:23,304 (trainer:732) INFO: 52epoch:train:2149-2506batch: iter_time=0.003, forward_time=0.100, loss_ctc=38.791, loss_att=19.418, acc=0.811, loss=25.230, backward_time=0.054, grad_norm=183.865, clip=100.000, loss_scale=261.721, optim_step_time=0.033, optim0_lr0=2.858e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:54:54,666 (trainer:732) INFO: 52epoch:train:2507-2864batch: iter_time=0.003, forward_time=0.099, loss_ctc=37.055, loss_att=18.601, acc=0.815, loss=24.138, backward_time=0.053, grad_norm=181.368, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.856e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:55:22,529 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:56:25,773 (trainer:732) INFO: 52epoch:train:2865-3222batch: iter_time=0.002, forward_time=0.099, loss_ctc=41.657, loss_att=20.950, acc=0.809, loss=27.162, backward_time=0.053, grad_norm=194.934, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.855e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:57:57,784 (trainer:732) INFO: 52epoch:train:3223-3580batch: iter_time=0.006, forward_time=0.099, loss_ctc=36.776, loss_att=18.416, acc=0.817, loss=23.924, backward_time=0.053, grad_norm=182.900, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.853e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 20:59:30,114 (trainer:732) INFO: 52epoch:train:3581-3938batch: iter_time=0.005, forward_time=0.099, loss_ctc=38.763, loss_att=19.512, acc=0.814, loss=25.287, backward_time=0.053, grad_norm=183.567, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.852e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:01:01,056 (trainer:732) INFO: 52epoch:train:3939-4296batch: iter_time=0.004, forward_time=0.098, loss_ctc=38.691, loss_att=19.410, acc=0.813, loss=25.194, backward_time=0.054, grad_norm=184.549, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.851e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:02:10,267 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:02:33,437 (trainer:732) INFO: 52epoch:train:4297-4654batch: iter_time=0.004, forward_time=0.100, loss_ctc=39.926, loss_att=20.080, acc=0.813, loss=26.034, backward_time=0.054, grad_norm=189.705, clip=100.000, loss_scale=606.655, optim_step_time=0.033, optim0_lr0=2.849e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:04:06,494 (trainer:732) INFO: 52epoch:train:4655-5012batch: iter_time=0.005, forward_time=0.101, loss_ctc=38.620, loss_att=19.392, acc=0.815, loss=25.161, backward_time=0.053, grad_norm=187.711, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.848e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:05:34,517 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:05:39,349 (trainer:732) INFO: 52epoch:train:5013-5370batch: iter_time=0.006, forward_time=0.100, loss_ctc=38.573, loss_att=19.420, acc=0.814, loss=25.166, backward_time=0.054, grad_norm=187.952, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.847e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:07:11,802 (trainer:732) INFO: 52epoch:train:5371-5728batch: iter_time=0.008, forward_time=0.098, loss_ctc=37.589, loss_att=18.877, acc=0.815, loss=24.490, backward_time=0.053, grad_norm=190.964, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.845e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:08:43,789 (trainer:732) INFO: 52epoch:train:5729-6086batch: iter_time=0.006, forward_time=0.099, loss_ctc=37.451, loss_att=18.790, acc=0.813, loss=24.388, backward_time=0.054, grad_norm=183.860, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.844e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:10:15,185 (trainer:732) INFO: 52epoch:train:6087-6444batch: iter_time=0.007, forward_time=0.097, loss_ctc=38.537, loss_att=19.352, acc=0.811, loss=25.107, backward_time=0.054, grad_norm=183.084, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.842e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:11:28,947 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:11:49,042 (trainer:732) INFO: 52epoch:train:6445-6802batch: iter_time=0.008, forward_time=0.100, loss_ctc=38.028, loss_att=19.184, acc=0.812, loss=24.837, backward_time=0.055, grad_norm=184.219, clip=100.000, loss_scale=741.468, optim_step_time=0.033, optim0_lr0=2.841e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:12:27,225 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:13:21,699 (trainer:732) INFO: 52epoch:train:6803-7160batch: iter_time=0.009, forward_time=0.098, loss_ctc=37.085, loss_att=18.599, acc=0.815, loss=24.145, backward_time=0.053, grad_norm=180.475, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.840e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:14:30,577 (trainer:338) INFO: 52epoch results: [train] iter_time=0.004, forward_time=0.099, loss_ctc=38.450, loss_att=19.303, acc=0.813, loss=25.047, backward_time=0.054, grad_norm=186.056, clip=100.000, loss_scale=462.281, optim_step_time=0.033, optim0_lr0=2.853e-05, train_time=0.256, time=30 minutes and 36.54 seconds, total_count=372372, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=18.610, cer_ctc=0.101, loss_att=9.521, acc=0.908, cer=0.059, wer=0.736, loss=12.248, time=14.36 seconds, total_count=2756, gpu_max_cached_mem_GB=28.453, [att_plot] time=53.89 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:14:34,191 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:14:34,214 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/42epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:14:34,214 (trainer:272) INFO: 53/100epoch started. Estimated time to finish: 1 day, 1 hour and 41 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:16:05,900 (trainer:732) INFO: 53epoch:train:1-358batch: iter_time=0.003, forward_time=0.099, loss_ctc=38.990, loss_att=19.546, acc=0.814, loss=25.380, backward_time=0.052, grad_norm=183.538, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.838e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:17:30,544 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:17:35,408 (trainer:732) INFO: 53epoch:train:359-716batch: iter_time=0.001, forward_time=0.098, loss_ctc=37.156, loss_att=18.610, acc=0.815, loss=24.174, backward_time=0.053, grad_norm=182.737, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.837e-05, train_time=0.250 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:19:06,366 (trainer:732) INFO: 53epoch:train:717-1074batch: iter_time=0.002, forward_time=0.099, loss_ctc=38.409, loss_att=19.229, acc=0.813, loss=24.983, backward_time=0.054, grad_norm=189.273, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.836e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:20:37,589 (trainer:732) INFO: 53epoch:train:1075-1432batch: iter_time=0.001, forward_time=0.100, loss_ctc=39.376, loss_att=19.758, acc=0.812, loss=25.644, backward_time=0.053, grad_norm=186.052, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.834e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:21:20,453 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:22:06,752 (trainer:732) INFO: 53epoch:train:1433-1790batch: iter_time=0.001, forward_time=0.098, loss_ctc=36.879, loss_att=18.528, acc=0.816, loss=24.033, backward_time=0.053, grad_norm=181.330, clip=100.000, loss_scale=565.064, optim_step_time=0.033, optim0_lr0=2.833e-05, train_time=0.249 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:23:38,432 (trainer:732) INFO: 53epoch:train:1791-2148batch: iter_time=0.002, forward_time=0.100, loss_ctc=39.030, loss_att=19.639, acc=0.813, loss=25.456, backward_time=0.053, grad_norm=194.634, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.832e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:25:09,798 (trainer:732) INFO: 53epoch:train:2149-2506batch: iter_time=0.002, forward_time=0.100, loss_ctc=38.111, loss_att=19.104, acc=0.813, loss=24.806, backward_time=0.054, grad_norm=185.687, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.830e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:26:40,882 (trainer:732) INFO: 53epoch:train:2507-2864batch: iter_time=0.004, forward_time=0.098, loss_ctc=37.873, loss_att=18.987, acc=0.817, loss=24.653, backward_time=0.053, grad_norm=182.276, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.829e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:28:13,513 (trainer:732) INFO: 53epoch:train:2865-3222batch: iter_time=0.003, forward_time=0.101, loss_ctc=41.533, loss_att=20.867, acc=0.811, loss=27.067, backward_time=0.053, grad_norm=192.780, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.827e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:29:04,558 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:29:46,303 (trainer:732) INFO: 53epoch:train:3223-3580batch: iter_time=0.009, forward_time=0.098, loss_ctc=38.212, loss_att=19.161, acc=0.813, loss=24.876, backward_time=0.054, grad_norm=183.987, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.826e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:30:42,519 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:31:19,992 (trainer:732) INFO: 53epoch:train:3581-3938batch: iter_time=0.008, forward_time=0.100, loss_ctc=36.546, loss_att=18.286, acc=0.818, loss=23.764, backward_time=0.053, grad_norm=177.794, clip=100.000, loss_scale=785.927, optim_step_time=0.033, optim0_lr0=2.825e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:32:51,453 (trainer:732) INFO: 53epoch:train:3939-4296batch: iter_time=0.005, forward_time=0.098, loss_ctc=38.584, loss_att=19.343, acc=0.815, loss=25.115, backward_time=0.053, grad_norm=185.216, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.823e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:34:22,513 (trainer:732) INFO: 53epoch:train:4297-4654batch: iter_time=0.004, forward_time=0.098, loss_ctc=38.003, loss_att=19.022, acc=0.816, loss=24.716, backward_time=0.055, grad_norm=189.973, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.822e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:35:57,545 (trainer:732) INFO: 53epoch:train:4655-5012batch: iter_time=0.012, forward_time=0.100, loss_ctc=35.557, loss_att=17.791, acc=0.822, loss=23.121, backward_time=0.054, grad_norm=178.610, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.821e-05, train_time=0.265 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:37:30,747 (trainer:732) INFO: 53epoch:train:5013-5370batch: iter_time=0.004, forward_time=0.101, loss_ctc=39.979, loss_att=20.188, acc=0.813, loss=26.125, backward_time=0.054, grad_norm=187.297, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.819e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:39:02,252 (trainer:732) INFO: 53epoch:train:5371-5728batch: iter_time=0.005, forward_time=0.098, loss_ctc=38.219, loss_att=19.239, acc=0.814, loss=24.933, backward_time=0.054, grad_norm=187.085, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.818e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:39:38,248 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:40:34,831 (trainer:732) INFO: 53epoch:train:5729-6086batch: iter_time=0.008, forward_time=0.099, loss_ctc=37.489, loss_att=18.782, acc=0.815, loss=24.394, backward_time=0.053, grad_norm=183.704, clip=100.000, loss_scale=616.695, optim_step_time=0.033, optim0_lr0=2.817e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:42:08,511 (trainer:732) INFO: 53epoch:train:6087-6444batch: iter_time=0.009, forward_time=0.099, loss_ctc=37.922, loss_att=19.059, acc=0.816, loss=24.718, backward_time=0.053, grad_norm=182.111, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.815e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:43:06,285 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:43:31,192 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:43:41,506 (trainer:732) INFO: 53epoch:train:6445-6802batch: iter_time=0.009, forward_time=0.098, loss_ctc=36.534, loss_att=18.274, acc=0.820, loss=23.752, backward_time=0.054, grad_norm=182.853, clip=100.000, loss_scale=415.193, optim_step_time=0.033, optim0_lr0=2.814e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:45:14,366 (trainer:732) INFO: 53epoch:train:6803-7160batch: iter_time=0.006, forward_time=0.099, loss_ctc=38.675, loss_att=19.493, acc=0.813, loss=25.248, backward_time=0.054, grad_norm=189.889, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.813e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:46:23,347 (trainer:338) INFO: 53epoch results: [train] iter_time=0.005, forward_time=0.099, loss_ctc=38.129, loss_att=19.132, acc=0.815, loss=24.831, backward_time=0.054, grad_norm=185.354, clip=100.000, loss_scale=515.863, optim_step_time=0.033, optim0_lr0=2.825e-05, train_time=0.257, time=30 minutes and 40.89 seconds, total_count=379533, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=18.420, cer_ctc=0.099, loss_att=9.464, acc=0.908, cer=0.059, wer=0.734, loss=12.151, time=14.69 seconds, total_count=2809, gpu_max_cached_mem_GB=28.453, [att_plot] time=53.55 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:46:26,962 (trainer:384) INFO: There are no improvements in this epoch [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:46:26,982 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/43epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:46:26,982 (trainer:272) INFO: 54/100epoch started. Estimated time to finish: 1 day, 1 hour and 8 minutes [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:47:59,374 (trainer:732) INFO: 54epoch:train:1-358batch: iter_time=0.004, forward_time=0.100, loss_ctc=38.110, loss_att=19.111, acc=0.815, loss=24.811, backward_time=0.053, grad_norm=186.197, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.811e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:49:28,965 (trainer:732) INFO: 54epoch:train:359-716batch: iter_time=0.001, forward_time=0.098, loss_ctc=37.759, loss_att=18.849, acc=0.816, loss=24.522, backward_time=0.053, grad_norm=183.950, clip=100.000, loss_scale=256.000, optim_step_time=0.034, optim0_lr0=2.810e-05, train_time=0.250 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:51:00,448 (trainer:732) INFO: 54epoch:train:717-1074batch: iter_time=0.002, forward_time=0.099, loss_ctc=37.304, loss_att=18.664, acc=0.816, loss=24.256, backward_time=0.054, grad_norm=183.243, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.809e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:51:39,657 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:52:32,389 (trainer:732) INFO: 54epoch:train:1075-1432batch: iter_time=0.003, forward_time=0.100, loss_ctc=38.529, loss_att=19.300, acc=0.816, loss=25.069, backward_time=0.053, grad_norm=187.225, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.807e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:54:04,501 (trainer:732) INFO: 54epoch:train:1433-1790batch: iter_time=0.003, forward_time=0.100, loss_ctc=37.552, loss_att=18.799, acc=0.817, loss=24.425, backward_time=0.053, grad_norm=189.196, clip=100.000, loss_scale=459.084, optim_step_time=0.033, optim0_lr0=2.806e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:55:33,550 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:55:36,500 (trainer:732) INFO: 54epoch:train:1791-2148batch: iter_time=0.004, forward_time=0.099, loss_ctc=39.232, loss_att=19.719, acc=0.814, loss=25.573, backward_time=0.053, grad_norm=189.050, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.805e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:56:22,617 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:57:09,096 (trainer:732) INFO: 54epoch:train:2149-2506batch: iter_time=0.003, forward_time=0.100, loss_ctc=38.813, loss_att=19.415, acc=0.817, loss=25.235, backward_time=0.053, grad_norm=191.837, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.804e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 21:58:39,666 (trainer:732) INFO: 54epoch:train:2507-2864batch: iter_time=0.002, forward_time=0.098, loss_ctc=37.906, loss_att=18.923, acc=0.815, loss=24.618, backward_time=0.054, grad_norm=183.937, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.802e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:00:13,448 (trainer:732) INFO: 54epoch:train:2865-3222batch: iter_time=0.011, forward_time=0.098, loss_ctc=34.927, loss_att=17.462, acc=0.821, loss=22.702, backward_time=0.056, grad_norm=175.144, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.801e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:01:45,962 (trainer:732) INFO: 54epoch:train:3223-3580batch: iter_time=0.008, forward_time=0.098, loss_ctc=36.021, loss_att=18.040, acc=0.822, loss=23.434, backward_time=0.053, grad_norm=185.050, clip=100.000, loss_scale=617.832, optim_step_time=0.033, optim0_lr0=2.800e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:02:39,594 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:03:19,161 (trainer:732) INFO: 54epoch:train:3581-3938batch: iter_time=0.008, forward_time=0.099, loss_ctc=36.747, loss_att=18.365, acc=0.818, loss=23.879, backward_time=0.054, grad_norm=180.677, clip=100.000, loss_scale=804.571, optim_step_time=0.033, optim0_lr0=2.798e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:04:52,149 (trainer:732) INFO: 54epoch:train:3939-4296batch: iter_time=0.005, forward_time=0.100, loss_ctc=38.825, loss_att=19.433, acc=0.818, loss=25.251, backward_time=0.056, grad_norm=190.820, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.797e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:06:23,712 (trainer:732) INFO: 54epoch:train:4297-4654batch: iter_time=0.005, forward_time=0.098, loss_ctc=37.290, loss_att=18.751, acc=0.819, loss=24.313, backward_time=0.054, grad_norm=188.511, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.796e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:07:55,147 (trainer:732) INFO: 54epoch:train:4655-5012batch: iter_time=0.005, forward_time=0.098, loss_ctc=38.434, loss_att=19.255, acc=0.815, loss=25.009, backward_time=0.054, grad_norm=187.611, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.794e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:09:27,035 (trainer:732) INFO: 54epoch:train:5013-5370batch: iter_time=0.006, forward_time=0.098, loss_ctc=38.056, loss_att=19.143, acc=0.816, loss=24.817, backward_time=0.053, grad_norm=183.192, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.793e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:10:59,551 (trainer:732) INFO: 54epoch:train:5371-5728batch: iter_time=0.007, forward_time=0.098, loss_ctc=38.373, loss_att=19.190, acc=0.818, loss=24.945, backward_time=0.053, grad_norm=184.974, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.792e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:11:39,823 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:12:32,225 (trainer:732) INFO: 54epoch:train:5729-6086batch: iter_time=0.007, forward_time=0.098, loss_ctc=38.389, loss_att=19.254, acc=0.816, loss=24.994, backward_time=0.054, grad_norm=194.331, clip=100.000, loss_scale=649.681, optim_step_time=0.033, optim0_lr0=2.790e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:14:06,086 (trainer:732) INFO: 54epoch:train:6087-6444batch: iter_time=0.009, forward_time=0.099, loss_ctc=37.854, loss_att=18.979, acc=0.817, loss=24.642, backward_time=0.053, grad_norm=187.387, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.789e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:15:39,678 (trainer:732) INFO: 54epoch:train:6445-6802batch: iter_time=0.009, forward_time=0.099, loss_ctc=37.535, loss_att=18.837, acc=0.817, loss=24.447, backward_time=0.054, grad_norm=184.508, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.788e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:17:12,339 (trainer:732) INFO: 54epoch:train:6803-7160batch: iter_time=0.005, forward_time=0.100, loss_ctc=38.629, loss_att=19.311, acc=0.818, loss=25.107, backward_time=0.054, grad_norm=187.160, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.787e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:18:21,037 (trainer:338) INFO: 54epoch results: [train] iter_time=0.005, forward_time=0.099, loss_ctc=37.799, loss_att=18.932, acc=0.817, loss=24.592, backward_time=0.054, grad_norm=186.213, clip=100.000, loss_scale=484.895, optim_step_time=0.033, optim0_lr0=2.799e-05, train_time=0.257, time=30 minutes and 46.1 seconds, total_count=386694, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=18.107, cer_ctc=0.098, loss_att=9.362, acc=0.909, cer=0.057, wer=0.726, loss=11.986, time=14.5 seconds, total_count=2862, gpu_max_cached_mem_GB=28.453, [att_plot] time=53.44 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:18:24,851 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:18:24,875 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/44epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:18:24,875 (trainer:272) INFO: 55/100epoch started. Estimated time to finish: 1 day, 36 minutes and 34.58 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:19:56,969 (trainer:732) INFO: 55epoch:train:1-358batch: iter_time=0.006, forward_time=0.098, loss_ctc=37.642, loss_att=18.862, acc=0.819, loss=24.496, backward_time=0.054, grad_norm=180.906, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.785e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:21:28,601 (trainer:732) INFO: 55epoch:train:359-716batch: iter_time=0.002, forward_time=0.100, loss_ctc=38.716, loss_att=19.368, acc=0.818, loss=25.173, backward_time=0.053, grad_norm=189.148, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.784e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:22:06,165 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:22:59,583 (trainer:732) INFO: 55epoch:train:717-1074batch: iter_time=0.002, forward_time=0.099, loss_ctc=38.161, loss_att=19.049, acc=0.819, loss=24.783, backward_time=0.054, grad_norm=181.522, clip=100.000, loss_scale=714.218, optim_step_time=0.033, optim0_lr0=2.783e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:24:22,679 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:24:31,023 (trainer:732) INFO: 55epoch:train:1075-1432batch: iter_time=0.002, forward_time=0.099, loss_ctc=38.132, loss_att=19.114, acc=0.819, loss=24.819, backward_time=0.055, grad_norm=191.863, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.781e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:26:03,830 (trainer:732) INFO: 55epoch:train:1433-1790batch: iter_time=0.007, forward_time=0.099, loss_ctc=37.933, loss_att=18.934, acc=0.822, loss=24.634, backward_time=0.054, grad_norm=187.411, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.780e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:26:46,778 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:27:35,266 (trainer:732) INFO: 55epoch:train:1791-2148batch: iter_time=0.003, forward_time=0.099, loss_ctc=39.758, loss_att=19.960, acc=0.815, loss=25.900, backward_time=0.054, grad_norm=197.168, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.779e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:28:41,082 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:29:06,450 (trainer:732) INFO: 55epoch:train:2149-2506batch: iter_time=0.003, forward_time=0.098, loss_ctc=37.180, loss_att=18.602, acc=0.819, loss=24.175, backward_time=0.057, grad_norm=184.763, clip=100.000, loss_scale=441.008, optim_step_time=0.033, optim0_lr0=2.778e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:30:39,709 (trainer:732) INFO: 55epoch:train:2507-2864batch: iter_time=0.007, forward_time=0.100, loss_ctc=36.515, loss_att=18.242, acc=0.820, loss=23.723, backward_time=0.055, grad_norm=189.055, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.776e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:32:12,447 (trainer:732) INFO: 55epoch:train:2865-3222batch: iter_time=0.007, forward_time=0.099, loss_ctc=37.740, loss_att=18.858, acc=0.817, loss=24.523, backward_time=0.053, grad_norm=186.191, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.775e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:33:46,099 (trainer:732) INFO: 55epoch:train:3223-3580batch: iter_time=0.009, forward_time=0.100, loss_ctc=35.876, loss_att=17.953, acc=0.820, loss=23.330, backward_time=0.053, grad_norm=181.803, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.774e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:35:18,624 (trainer:732) INFO: 55epoch:train:3581-3938batch: iter_time=0.005, forward_time=0.100, loss_ctc=38.027, loss_att=19.044, acc=0.817, loss=24.739, backward_time=0.053, grad_norm=187.769, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.772e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:36:50,475 (trainer:732) INFO: 55epoch:train:3939-4296batch: iter_time=0.007, forward_time=0.098, loss_ctc=35.377, loss_att=17.723, acc=0.821, loss=23.019, backward_time=0.053, grad_norm=180.095, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.771e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:38:22,205 (trainer:732) INFO: 55epoch:train:4297-4654batch: iter_time=0.006, forward_time=0.098, loss_ctc=37.682, loss_att=18.886, acc=0.817, loss=24.525, backward_time=0.053, grad_norm=187.995, clip=100.000, loss_scale=432.626, optim_step_time=0.033, optim0_lr0=2.770e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:39:55,136 (trainer:732) INFO: 55epoch:train:4655-5012batch: iter_time=0.008, forward_time=0.099, loss_ctc=37.917, loss_att=18.972, acc=0.816, loss=24.655, backward_time=0.053, grad_norm=187.472, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.769e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:41:28,826 (trainer:732) INFO: 55epoch:train:5013-5370batch: iter_time=0.009, forward_time=0.099, loss_ctc=38.585, loss_att=19.249, acc=0.818, loss=25.050, backward_time=0.053, grad_norm=187.178, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.767e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:43:01,511 (trainer:732) INFO: 55epoch:train:5371-5728batch: iter_time=0.009, forward_time=0.098, loss_ctc=36.656, loss_att=18.298, acc=0.818, loss=23.805, backward_time=0.053, grad_norm=184.360, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.766e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:44:33,863 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:44:35,960 (trainer:732) INFO: 55epoch:train:5729-6086batch: iter_time=0.005, forward_time=0.102, loss_ctc=38.742, loss_att=19.458, acc=0.817, loss=25.244, backward_time=0.053, grad_norm=189.182, clip=100.000, loss_scale=512.000, optim_step_time=0.034, optim0_lr0=2.765e-05, train_time=0.264 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:46:00,921 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:46:08,718 (trainer:732) INFO: 55epoch:train:6087-6444batch: iter_time=0.007, forward_time=0.099, loss_ctc=37.596, loss_att=18.826, acc=0.819, loss=24.457, backward_time=0.055, grad_norm=187.462, clip=100.000, loss_scale=519.171, optim_step_time=0.033, optim0_lr0=2.764e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:47:42,336 (trainer:732) INFO: 55epoch:train:6445-6802batch: iter_time=0.013, forward_time=0.098, loss_ctc=35.475, loss_att=17.804, acc=0.819, loss=23.105, backward_time=0.053, grad_norm=181.162, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.762e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:49:16,453 (trainer:732) INFO: 55epoch:train:6803-7160batch: iter_time=0.011, forward_time=0.099, loss_ctc=36.656, loss_att=18.414, acc=0.822, loss=23.887, backward_time=0.053, grad_norm=188.075, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.761e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:50:24,713 (trainer:338) INFO: 55epoch results: [train] iter_time=0.006, forward_time=0.099, loss_ctc=37.493, loss_att=18.768, acc=0.819, loss=24.386, backward_time=0.054, grad_norm=186.522, clip=100.000, loss_scale=450.915, optim_step_time=0.033, optim0_lr0=2.773e-05, train_time=0.258, time=30 minutes and 52.23 seconds, total_count=393855, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=17.822, cer_ctc=0.097, loss_att=9.223, acc=0.910, cer=0.057, wer=0.724, loss=11.803, time=14.21 seconds, total_count=2915, gpu_max_cached_mem_GB=28.453, [att_plot] time=53.39 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:50:28,541 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:50:28,566 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/45epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:50:28,566 (trainer:272) INFO: 56/100epoch started. Estimated time to finish: 1 day, 4 minutes and 26.68 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:52:01,143 (trainer:732) INFO: 56epoch:train:1-358batch: iter_time=0.004, forward_time=0.100, loss_ctc=37.137, loss_att=18.605, acc=0.820, loss=24.165, backward_time=0.055, grad_norm=183.701, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.760e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:53:31,098 (trainer:732) INFO: 56epoch:train:359-716batch: iter_time=0.002, forward_time=0.098, loss_ctc=35.131, loss_att=17.429, acc=0.824, loss=22.740, backward_time=0.054, grad_norm=182.351, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.759e-05, train_time=0.251 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:54:21,664 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:54:39,957 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:55:02,413 (trainer:732) INFO: 56epoch:train:717-1074batch: iter_time=0.002, forward_time=0.099, loss_ctc=37.550, loss_att=18.743, acc=0.820, loss=24.385, backward_time=0.053, grad_norm=184.096, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.757e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:56:34,043 (trainer:732) INFO: 56epoch:train:1075-1432batch: iter_time=0.002, forward_time=0.100, loss_ctc=37.954, loss_att=19.023, acc=0.818, loss=24.702, backward_time=0.053, grad_norm=184.054, clip=100.000, loss_scale=769.430, optim_step_time=0.033, optim0_lr0=2.756e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:57:32,048 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:58:03,959 (trainer:732) INFO: 56epoch:train:1433-1790batch: iter_time=0.002, forward_time=0.098, loss_ctc=37.401, loss_att=18.684, acc=0.818, loss=24.300, backward_time=0.053, grad_norm=180.020, clip=100.000, loss_scale=840.426, optim_step_time=0.033, optim0_lr0=2.755e-05, train_time=0.251 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 22:59:34,863 (trainer:732) INFO: 56epoch:train:1791-2148batch: iter_time=0.002, forward_time=0.099, loss_ctc=39.081, loss_att=19.545, acc=0.816, loss=25.406, backward_time=0.053, grad_norm=188.674, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.754e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:01:06,550 (trainer:732) INFO: 56epoch:train:2149-2506batch: iter_time=0.004, forward_time=0.099, loss_ctc=37.728, loss_att=18.857, acc=0.820, loss=24.518, backward_time=0.053, grad_norm=187.418, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.752e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:01:12,213 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:02:38,843 (trainer:732) INFO: 56epoch:train:2507-2864batch: iter_time=0.004, forward_time=0.100, loss_ctc=36.938, loss_att=18.446, acc=0.821, loss=23.994, backward_time=0.053, grad_norm=185.834, clip=100.000, loss_scale=270.342, optim_step_time=0.033, optim0_lr0=2.751e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:04:10,345 (trainer:732) INFO: 56epoch:train:2865-3222batch: iter_time=0.003, forward_time=0.099, loss_ctc=38.223, loss_att=19.103, acc=0.820, loss=24.839, backward_time=0.053, grad_norm=190.137, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.750e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:05:34,771 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:05:42,025 (trainer:732) INFO: 56epoch:train:3223-3580batch: iter_time=0.006, forward_time=0.098, loss_ctc=36.893, loss_att=18.439, acc=0.823, loss=23.975, backward_time=0.053, grad_norm=185.189, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.749e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:07:14,516 (trainer:732) INFO: 56epoch:train:3581-3938batch: iter_time=0.005, forward_time=0.099, loss_ctc=36.385, loss_att=18.206, acc=0.821, loss=23.660, backward_time=0.055, grad_norm=186.518, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.747e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:08:46,916 (trainer:732) INFO: 56epoch:train:3939-4296batch: iter_time=0.006, forward_time=0.099, loss_ctc=36.951, loss_att=18.488, acc=0.820, loss=24.027, backward_time=0.053, grad_norm=187.944, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.746e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:10:18,813 (trainer:732) INFO: 56epoch:train:4297-4654batch: iter_time=0.006, forward_time=0.099, loss_ctc=37.991, loss_att=18.982, acc=0.819, loss=24.684, backward_time=0.053, grad_norm=189.471, clip=100.000, loss_scale=346.816, optim_step_time=0.033, optim0_lr0=2.745e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:11:51,244 (trainer:732) INFO: 56epoch:train:4655-5012batch: iter_time=0.006, forward_time=0.099, loss_ctc=37.445, loss_att=18.769, acc=0.821, loss=24.372, backward_time=0.054, grad_norm=184.946, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.744e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:13:22,914 (trainer:732) INFO: 56epoch:train:5013-5370batch: iter_time=0.005, forward_time=0.099, loss_ctc=37.845, loss_att=19.002, acc=0.818, loss=24.655, backward_time=0.053, grad_norm=188.698, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.742e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:14:54,426 (trainer:732) INFO: 56epoch:train:5371-5728batch: iter_time=0.006, forward_time=0.098, loss_ctc=36.786, loss_att=18.423, acc=0.821, loss=23.932, backward_time=0.053, grad_norm=183.256, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.741e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:16:27,859 (trainer:732) INFO: 56epoch:train:5729-6086batch: iter_time=0.010, forward_time=0.098, loss_ctc=35.903, loss_att=17.958, acc=0.820, loss=23.342, backward_time=0.053, grad_norm=178.846, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.740e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:18:01,491 (trainer:732) INFO: 56epoch:train:6087-6444batch: iter_time=0.006, forward_time=0.100, loss_ctc=38.203, loss_att=19.161, acc=0.819, loss=24.873, backward_time=0.054, grad_norm=189.255, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.739e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:18:28,407 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:19:35,832 (trainer:732) INFO: 56epoch:train:6445-6802batch: iter_time=0.010, forward_time=0.100, loss_ctc=36.735, loss_att=18.351, acc=0.821, loss=23.866, backward_time=0.053, grad_norm=180.172, clip=100.000, loss_scale=536.381, optim_step_time=0.033, optim0_lr0=2.738e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:21:09,554 (trainer:732) INFO: 56epoch:train:6803-7160batch: iter_time=0.010, forward_time=0.099, loss_ctc=36.742, loss_att=18.390, acc=0.822, loss=23.896, backward_time=0.053, grad_norm=189.877, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.736e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:22:18,067 (trainer:338) INFO: 56epoch results: [train] iter_time=0.005, forward_time=0.099, loss_ctc=37.239, loss_att=18.624, acc=0.820, loss=24.209, backward_time=0.053, grad_norm=185.540, clip=100.000, loss_scale=470.943, optim_step_time=0.033, optim0_lr0=2.748e-05, train_time=0.257, time=30 minutes and 41.72 seconds, total_count=401016, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=17.673, cer_ctc=0.096, loss_att=9.170, acc=0.911, cer=0.057, wer=0.723, loss=11.721, time=14.38 seconds, total_count=2968, gpu_max_cached_mem_GB=28.453, [att_plot] time=53.4 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:22:21,553 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:22:21,588 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/46epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:22:21,589 (trainer:272) INFO: 57/100epoch started. Estimated time to finish: 23 hours, 32 minutes and 10.24 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:23:53,822 (trainer:732) INFO: 57epoch:train:1-358batch: iter_time=0.005, forward_time=0.099, loss_ctc=37.120, loss_att=18.561, acc=0.823, loss=24.128, backward_time=0.053, grad_norm=187.176, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.735e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:25:26,101 (trainer:732) INFO: 57epoch:train:359-716batch: iter_time=0.005, forward_time=0.100, loss_ctc=36.162, loss_att=18.024, acc=0.824, loss=23.466, backward_time=0.053, grad_norm=185.156, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.734e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:26:56,923 (trainer:732) INFO: 57epoch:train:717-1074batch: iter_time=0.001, forward_time=0.099, loss_ctc=37.134, loss_att=18.491, acc=0.824, loss=24.084, backward_time=0.053, grad_norm=187.303, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.733e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:28:29,024 (trainer:732) INFO: 57epoch:train:1075-1432batch: iter_time=0.002, forward_time=0.100, loss_ctc=35.569, loss_att=17.834, acc=0.822, loss=23.154, backward_time=0.056, grad_norm=179.397, clip=100.000, loss_scale=580.648, optim_step_time=0.033, optim0_lr0=2.731e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:28:35,581 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:28:39,824 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:30:00,465 (trainer:732) INFO: 57epoch:train:1433-1790batch: iter_time=0.001, forward_time=0.100, loss_ctc=38.183, loss_att=19.101, acc=0.820, loss=24.826, backward_time=0.054, grad_norm=190.222, clip=100.000, loss_scale=570.801, optim_step_time=0.033, optim0_lr0=2.730e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:31:31,557 (trainer:732) INFO: 57epoch:train:1791-2148batch: iter_time=0.002, forward_time=0.099, loss_ctc=36.005, loss_att=17.991, acc=0.824, loss=23.395, backward_time=0.054, grad_norm=182.552, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.729e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:33:01,899 (trainer:732) INFO: 57epoch:train:2149-2506batch: iter_time=0.003, forward_time=0.097, loss_ctc=36.085, loss_att=18.082, acc=0.824, loss=23.482, backward_time=0.055, grad_norm=187.647, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.728e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:33:34,611 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:34:33,474 (trainer:732) INFO: 57epoch:train:2507-2864batch: iter_time=0.003, forward_time=0.099, loss_ctc=37.834, loss_att=18.945, acc=0.820, loss=24.612, backward_time=0.054, grad_norm=188.572, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.727e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:36:04,689 (trainer:732) INFO: 57epoch:train:2865-3222batch: iter_time=0.005, forward_time=0.098, loss_ctc=35.674, loss_att=17.761, acc=0.825, loss=23.135, backward_time=0.053, grad_norm=186.599, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.725e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:37:37,792 (trainer:732) INFO: 57epoch:train:3223-3580batch: iter_time=0.004, forward_time=0.101, loss_ctc=37.412, loss_att=18.619, acc=0.821, loss=24.257, backward_time=0.054, grad_norm=190.324, clip=100.000, loss_scale=663.598, optim_step_time=0.033, optim0_lr0=2.724e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:38:11,427 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:39:08,813 (trainer:732) INFO: 57epoch:train:3581-3938batch: iter_time=0.004, forward_time=0.098, loss_ctc=36.529, loss_att=18.276, acc=0.823, loss=23.752, backward_time=0.054, grad_norm=184.687, clip=100.000, loss_scale=704.179, optim_step_time=0.033, optim0_lr0=2.723e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:40:40,971 (trainer:732) INFO: 57epoch:train:3939-4296batch: iter_time=0.004, forward_time=0.099, loss_ctc=36.938, loss_att=18.473, acc=0.824, loss=24.013, backward_time=0.054, grad_norm=187.192, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.722e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:42:12,098 (trainer:732) INFO: 57epoch:train:4297-4654batch: iter_time=0.003, forward_time=0.099, loss_ctc=37.173, loss_att=18.584, acc=0.820, loss=24.160, backward_time=0.053, grad_norm=189.637, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.721e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:43:43,306 (trainer:732) INFO: 57epoch:train:4655-5012batch: iter_time=0.004, forward_time=0.098, loss_ctc=36.979, loss_att=18.426, acc=0.824, loss=23.992, backward_time=0.054, grad_norm=186.995, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.719e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:45:16,369 (trainer:732) INFO: 57epoch:train:5013-5370batch: iter_time=0.008, forward_time=0.099, loss_ctc=36.578, loss_att=18.330, acc=0.819, loss=23.804, backward_time=0.053, grad_norm=181.051, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.718e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:45:39,131 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:46:49,411 (trainer:732) INFO: 57epoch:train:5371-5728batch: iter_time=0.006, forward_time=0.099, loss_ctc=36.609, loss_att=18.303, acc=0.821, loss=23.795, backward_time=0.054, grad_norm=178.988, clip=100.000, loss_scale=530.592, optim_step_time=0.033, optim0_lr0=2.717e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:47:03,395 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:48:21,381 (trainer:732) INFO: 57epoch:train:5729-6086batch: iter_time=0.005, forward_time=0.098, loss_ctc=37.782, loss_att=18.947, acc=0.816, loss=24.598, backward_time=0.054, grad_norm=183.168, clip=100.000, loss_scale=589.445, optim_step_time=0.033, optim0_lr0=2.716e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:49:54,136 (trainer:732) INFO: 57epoch:train:6087-6444batch: iter_time=0.006, forward_time=0.099, loss_ctc=37.557, loss_att=18.809, acc=0.821, loss=24.433, backward_time=0.053, grad_norm=185.320, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.715e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:51:26,274 (trainer:732) INFO: 57epoch:train:6445-6802batch: iter_time=0.008, forward_time=0.098, loss_ctc=36.468, loss_att=18.254, acc=0.822, loss=23.718, backward_time=0.053, grad_norm=184.866, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.713e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:52:59,435 (trainer:732) INFO: 57epoch:train:6803-7160batch: iter_time=0.007, forward_time=0.099, loss_ctc=38.728, loss_att=19.433, acc=0.818, loss=25.221, backward_time=0.053, grad_norm=193.060, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.712e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:54:08,221 (trainer:338) INFO: 57epoch results: [train] iter_time=0.004, forward_time=0.099, loss_ctc=36.919, loss_att=18.459, acc=0.822, loss=23.997, backward_time=0.054, grad_norm=185.991, clip=100.000, loss_scale=540.325, optim_step_time=0.033, optim0_lr0=2.724e-05, train_time=0.256, time=30 minutes and 38.55 seconds, total_count=408177, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=17.523, cer_ctc=0.095, loss_att=9.067, acc=0.912, cer=0.056, wer=0.717, loss=11.604, time=14.52 seconds, total_count=3021, gpu_max_cached_mem_GB=28.453, [att_plot] time=53.55 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:54:11,720 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:54:11,746 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/47epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:54:11,746 (trainer:272) INFO: 58/100epoch started. Estimated time to finish: 22 hours, 59 minutes and 52.41 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:54:54,724 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:55:44,102 (trainer:732) INFO: 58epoch:train:1-358batch: iter_time=0.002, forward_time=0.101, loss_ctc=39.362, loss_att=19.555, acc=0.819, loss=25.497, backward_time=0.053, grad_norm=191.666, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.711e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:57:03,583 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:57:15,023 (trainer:732) INFO: 58epoch:train:359-716batch: iter_time=5.909e-04, forward_time=0.100, loss_ctc=37.402, loss_att=18.669, acc=0.821, loss=24.289, backward_time=0.053, grad_norm=184.969, clip=100.000, loss_scale=579.406, optim_step_time=0.033, optim0_lr0=2.710e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:58:21,207 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-13 23:58:46,040 (trainer:732) INFO: 58epoch:train:717-1074batch: iter_time=6.934e-04, forward_time=0.100, loss_ctc=37.248, loss_att=18.580, acc=0.823, loss=24.180, backward_time=0.053, grad_norm=187.887, clip=100.000, loss_scale=441.725, optim_step_time=0.033, optim0_lr0=2.709e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:00:17,338 (trainer:732) INFO: 58epoch:train:1075-1432batch: iter_time=0.004, forward_time=0.099, loss_ctc=35.765, loss_att=17.840, acc=0.824, loss=23.217, backward_time=0.053, grad_norm=181.861, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.707e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:01:48,504 (trainer:732) INFO: 58epoch:train:1433-1790batch: iter_time=0.004, forward_time=0.098, loss_ctc=36.180, loss_att=18.099, acc=0.823, loss=23.523, backward_time=0.053, grad_norm=186.334, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.706e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:03:19,614 (trainer:732) INFO: 58epoch:train:1791-2148batch: iter_time=0.001, forward_time=0.099, loss_ctc=37.122, loss_att=18.500, acc=0.824, loss=24.087, backward_time=0.054, grad_norm=186.091, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.705e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:04:50,307 (trainer:732) INFO: 58epoch:train:2149-2506batch: iter_time=0.002, forward_time=0.098, loss_ctc=38.377, loss_att=19.174, acc=0.821, loss=24.935, backward_time=0.053, grad_norm=186.448, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.704e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:06:23,023 (trainer:732) INFO: 58epoch:train:2507-2864batch: iter_time=0.004, forward_time=0.100, loss_ctc=37.147, loss_att=18.587, acc=0.823, loss=24.155, backward_time=0.053, grad_norm=189.143, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.703e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:06:41,457 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:07:54,302 (trainer:732) INFO: 58epoch:train:2865-3222batch: iter_time=0.006, forward_time=0.098, loss_ctc=35.251, loss_att=17.550, acc=0.823, loss=22.860, backward_time=0.054, grad_norm=184.214, clip=100.000, loss_scale=431.911, optim_step_time=0.033, optim0_lr0=2.702e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:09:26,398 (trainer:732) INFO: 58epoch:train:3223-3580batch: iter_time=0.004, forward_time=0.100, loss_ctc=37.683, loss_att=18.926, acc=0.819, loss=24.553, backward_time=0.054, grad_norm=191.425, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.700e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:10:57,798 (trainer:732) INFO: 58epoch:train:3581-3938batch: iter_time=0.002, forward_time=0.099, loss_ctc=38.436, loss_att=19.330, acc=0.819, loss=25.062, backward_time=0.053, grad_norm=193.542, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.699e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:12:29,921 (trainer:732) INFO: 58epoch:train:3939-4296batch: iter_time=0.005, forward_time=0.100, loss_ctc=36.142, loss_att=18.135, acc=0.822, loss=23.537, backward_time=0.053, grad_norm=190.258, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.698e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:14:03,308 (trainer:732) INFO: 58epoch:train:4297-4654batch: iter_time=0.007, forward_time=0.100, loss_ctc=35.228, loss_att=17.615, acc=0.825, loss=22.899, backward_time=0.055, grad_norm=181.355, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.697e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:15:35,533 (trainer:732) INFO: 58epoch:train:4655-5012batch: iter_time=0.010, forward_time=0.097, loss_ctc=34.462, loss_att=17.231, acc=0.827, loss=22.400, backward_time=0.054, grad_norm=184.572, clip=100.000, loss_scale=563.486, optim_step_time=0.033, optim0_lr0=2.696e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:16:02,881 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:16:55,101 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:17:08,196 (trainer:732) INFO: 58epoch:train:5013-5370batch: iter_time=0.006, forward_time=0.100, loss_ctc=37.055, loss_att=18.515, acc=0.823, loss=24.077, backward_time=0.053, grad_norm=192.206, clip=100.000, loss_scale=662.588, optim_step_time=0.033, optim0_lr0=2.695e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:18:40,488 (trainer:732) INFO: 58epoch:train:5371-5728batch: iter_time=0.005, forward_time=0.099, loss_ctc=36.220, loss_att=18.180, acc=0.826, loss=23.592, backward_time=0.055, grad_norm=187.144, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.693e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:20:13,293 (trainer:732) INFO: 58epoch:train:5729-6086batch: iter_time=0.011, forward_time=0.097, loss_ctc=34.347, loss_att=17.159, acc=0.827, loss=22.316, backward_time=0.055, grad_norm=179.572, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.692e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:21:46,920 (trainer:732) INFO: 58epoch:train:6087-6444batch: iter_time=0.007, forward_time=0.100, loss_ctc=36.867, loss_att=18.420, acc=0.826, loss=23.954, backward_time=0.053, grad_norm=188.190, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.691e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:23:19,767 (trainer:732) INFO: 58epoch:train:6445-6802batch: iter_time=0.007, forward_time=0.099, loss_ctc=37.023, loss_att=18.385, acc=0.824, loss=23.977, backward_time=0.053, grad_norm=182.757, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.690e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:24:54,408 (trainer:732) INFO: 58epoch:train:6803-7160batch: iter_time=0.013, forward_time=0.099, loss_ctc=35.883, loss_att=17.927, acc=0.826, loss=23.314, backward_time=0.054, grad_norm=180.495, clip=100.000, loss_scale=572.067, optim_step_time=0.033, optim0_lr0=2.689e-05, train_time=0.264 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:26:02,989 (trainer:338) INFO: 58epoch results: [train] iter_time=0.005, forward_time=0.099, loss_ctc=36.631, loss_att=18.304, acc=0.823, loss=23.802, backward_time=0.053, grad_norm=186.501, clip=100.000, loss_scale=456.995, optim_step_time=0.033, optim0_lr0=2.700e-05, train_time=0.257, time=30 minutes and 43.33 seconds, total_count=415338, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=17.260, cer_ctc=0.093, loss_att=8.975, acc=0.912, cer=0.056, wer=0.711, loss=11.460, time=14.36 seconds, total_count=3074, gpu_max_cached_mem_GB=28.453, [att_plot] time=53.54 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:26:06,706 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:26:06,734 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/48epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:26:06,734 (trainer:272) INFO: 59/100epoch started. Estimated time to finish: 22 hours, 27 minutes and 39.19 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:26:13,000 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:27:07,294 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:27:37,225 (trainer:732) INFO: 59epoch:train:1-358batch: iter_time=0.003, forward_time=0.097, loss_ctc=33.770, loss_att=16.766, acc=0.831, loss=21.867, backward_time=0.056, grad_norm=179.316, clip=100.000, loss_scale=540.683, optim_step_time=0.033, optim0_lr0=2.688e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:29:07,583 (trainer:732) INFO: 59epoch:train:359-716batch: iter_time=0.001, forward_time=0.098, loss_ctc=36.510, loss_att=18.194, acc=0.825, loss=23.689, backward_time=0.054, grad_norm=188.966, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.686e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:30:40,098 (trainer:732) INFO: 59epoch:train:717-1074batch: iter_time=0.005, forward_time=0.100, loss_ctc=36.476, loss_att=18.224, acc=0.825, loss=23.699, backward_time=0.054, grad_norm=186.083, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.685e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:32:10,895 (trainer:732) INFO: 59epoch:train:1075-1432batch: iter_time=0.002, forward_time=0.099, loss_ctc=36.592, loss_att=18.253, acc=0.822, loss=23.755, backward_time=0.053, grad_norm=185.219, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.684e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:33:41,161 (trainer:732) INFO: 59epoch:train:1433-1790batch: iter_time=0.001, forward_time=0.099, loss_ctc=36.270, loss_att=18.071, acc=0.826, loss=23.531, backward_time=0.053, grad_norm=184.683, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.683e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:33:47,389 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:34:57,899 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:35:12,846 (trainer:732) INFO: 59epoch:train:1791-2148batch: iter_time=0.002, forward_time=0.100, loss_ctc=38.186, loss_att=19.094, acc=0.824, loss=24.821, backward_time=0.054, grad_norm=190.526, clip=100.000, loss_scale=609.524, optim_step_time=0.033, optim0_lr0=2.682e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:36:44,973 (trainer:732) INFO: 59epoch:train:2149-2506batch: iter_time=0.005, forward_time=0.099, loss_ctc=36.611, loss_att=18.331, acc=0.823, loss=23.815, backward_time=0.053, grad_norm=180.802, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.681e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:38:17,088 (trainer:732) INFO: 59epoch:train:2507-2864batch: iter_time=0.004, forward_time=0.100, loss_ctc=35.551, loss_att=17.687, acc=0.826, loss=23.046, backward_time=0.053, grad_norm=185.420, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.680e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:39:49,249 (trainer:732) INFO: 59epoch:train:2865-3222batch: iter_time=0.003, forward_time=0.099, loss_ctc=38.121, loss_att=19.057, acc=0.823, loss=24.776, backward_time=0.053, grad_norm=185.948, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.678e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:41:21,662 (trainer:732) INFO: 59epoch:train:3223-3580batch: iter_time=0.006, forward_time=0.099, loss_ctc=35.850, loss_att=17.846, acc=0.826, loss=23.247, backward_time=0.053, grad_norm=182.784, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.677e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:42:54,637 (trainer:732) INFO: 59epoch:train:3581-3938batch: iter_time=0.007, forward_time=0.099, loss_ctc=36.279, loss_att=18.128, acc=0.822, loss=23.573, backward_time=0.055, grad_norm=185.552, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.676e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:44:26,726 (trainer:732) INFO: 59epoch:train:3939-4296batch: iter_time=0.005, forward_time=0.099, loss_ctc=37.728, loss_att=18.819, acc=0.823, loss=24.492, backward_time=0.055, grad_norm=189.920, clip=100.000, loss_scale=806.615, optim_step_time=0.033, optim0_lr0=2.675e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:44:38,237 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:45:10,413 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:45:59,617 (trainer:732) INFO: 59epoch:train:4297-4654batch: iter_time=0.004, forward_time=0.100, loss_ctc=37.700, loss_att=18.908, acc=0.824, loss=24.545, backward_time=0.054, grad_norm=188.660, clip=100.000, loss_scale=572.235, optim_step_time=0.033, optim0_lr0=2.674e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:47:31,357 (trainer:732) INFO: 59epoch:train:4655-5012batch: iter_time=0.004, forward_time=0.099, loss_ctc=37.571, loss_att=18.869, acc=0.822, loss=24.480, backward_time=0.054, grad_norm=188.175, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.673e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:49:02,720 (trainer:732) INFO: 59epoch:train:5013-5370batch: iter_time=0.006, forward_time=0.098, loss_ctc=34.661, loss_att=17.215, acc=0.826, loss=22.449, backward_time=0.053, grad_norm=180.651, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.672e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:50:34,685 (trainer:732) INFO: 59epoch:train:5371-5728batch: iter_time=0.007, forward_time=0.098, loss_ctc=35.726, loss_att=17.768, acc=0.827, loss=23.156, backward_time=0.054, grad_norm=182.470, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.670e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:52:05,932 (trainer:732) INFO: 59epoch:train:5729-6086batch: iter_time=0.004, forward_time=0.099, loss_ctc=37.975, loss_att=19.045, acc=0.823, loss=24.724, backward_time=0.053, grad_norm=190.962, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.669e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:53:31,877 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:53:39,553 (trainer:732) INFO: 59epoch:train:6087-6444batch: iter_time=0.008, forward_time=0.100, loss_ctc=37.095, loss_att=18.539, acc=0.824, loss=24.106, backward_time=0.054, grad_norm=195.525, clip=100.000, loss_scale=619.563, optim_step_time=0.033, optim0_lr0=2.668e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:55:14,647 (trainer:732) INFO: 59epoch:train:6445-6802batch: iter_time=0.015, forward_time=0.098, loss_ctc=34.953, loss_att=17.466, acc=0.826, loss=22.712, backward_time=0.053, grad_norm=184.028, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.667e-05, train_time=0.265 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:56:48,850 (trainer:732) INFO: 59epoch:train:6803-7160batch: iter_time=0.010, forward_time=0.099, loss_ctc=34.682, loss_att=17.288, acc=0.829, loss=22.506, backward_time=0.054, grad_norm=182.090, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.666e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:57:57,877 (trainer:338) INFO: 59epoch results: [train] iter_time=0.005, forward_time=0.099, loss_ctc=36.386, loss_att=18.163, acc=0.825, loss=23.630, backward_time=0.054, grad_norm=185.888, clip=100.000, loss_scale=541.402, optim_step_time=0.033, optim0_lr0=2.677e-05, train_time=0.257, time=30 minutes and 42.78 seconds, total_count=422499, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=17.228, cer_ctc=0.092, loss_att=8.923, acc=0.914, cer=0.055, wer=0.709, loss=11.414, time=14.51 seconds, total_count=3127, gpu_max_cached_mem_GB=28.453, [att_plot] time=53.85 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:58:01,587 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:58:01,613 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/49epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:58:01,613 (trainer:272) INFO: 60/100epoch started. Estimated time to finish: 21 hours, 55 minutes and 26.53 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 00:59:33,043 (trainer:732) INFO: 60epoch:train:1-358batch: iter_time=0.003, forward_time=0.099, loss_ctc=36.317, loss_att=18.190, acc=0.824, loss=23.628, backward_time=0.053, grad_norm=190.390, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.665e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:01:02,725 (trainer:732) INFO: 60epoch:train:359-716batch: iter_time=0.002, forward_time=0.098, loss_ctc=34.660, loss_att=17.182, acc=0.830, loss=22.425, backward_time=0.053, grad_norm=180.082, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.664e-05, train_time=0.250 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:02:34,018 (trainer:732) INFO: 60epoch:train:717-1074batch: iter_time=0.002, forward_time=0.099, loss_ctc=35.492, loss_att=17.766, acc=0.827, loss=23.084, backward_time=0.054, grad_norm=186.702, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.662e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:04:04,236 (trainer:732) INFO: 60epoch:train:1075-1432batch: iter_time=0.001, forward_time=0.098, loss_ctc=35.694, loss_att=17.795, acc=0.825, loss=23.165, backward_time=0.055, grad_norm=179.798, clip=100.000, loss_scale=766.570, optim_step_time=0.033, optim0_lr0=2.661e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:04:05,024 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:05:35,986 (trainer:732) INFO: 60epoch:train:1433-1790batch: iter_time=0.002, forward_time=0.100, loss_ctc=37.933, loss_att=18.838, acc=0.824, loss=24.566, backward_time=0.053, grad_norm=189.126, clip=100.000, loss_scale=514.868, optim_step_time=0.034, optim0_lr0=2.660e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:07:07,629 (trainer:732) INFO: 60epoch:train:1791-2148batch: iter_time=0.002, forward_time=0.100, loss_ctc=36.000, loss_att=17.891, acc=0.829, loss=23.324, backward_time=0.054, grad_norm=184.432, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.659e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:08:39,191 (trainer:732) INFO: 60epoch:train:2149-2506batch: iter_time=0.002, forward_time=0.100, loss_ctc=37.009, loss_att=18.475, acc=0.823, loss=24.035, backward_time=0.053, grad_norm=191.119, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.658e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:10:11,126 (trainer:732) INFO: 60epoch:train:2507-2864batch: iter_time=0.002, forward_time=0.100, loss_ctc=38.390, loss_att=19.102, acc=0.823, loss=24.888, backward_time=0.053, grad_norm=188.149, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.657e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:11:30,307 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:11:43,320 (trainer:732) INFO: 60epoch:train:2865-3222batch: iter_time=0.004, forward_time=0.100, loss_ctc=35.058, loss_att=17.480, acc=0.829, loss=22.753, backward_time=0.054, grad_norm=182.450, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.656e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:12:46,285 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:13:15,832 (trainer:732) INFO: 60epoch:train:3223-3580batch: iter_time=0.006, forward_time=0.099, loss_ctc=35.856, loss_att=17.812, acc=0.827, loss=23.225, backward_time=0.053, grad_norm=183.697, clip=100.000, loss_scale=559.328, optim_step_time=0.033, optim0_lr0=2.655e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:14:49,295 (trainer:732) INFO: 60epoch:train:3581-3938batch: iter_time=0.009, forward_time=0.100, loss_ctc=34.918, loss_att=17.437, acc=0.825, loss=22.682, backward_time=0.054, grad_norm=186.913, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.653e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:16:21,522 (trainer:732) INFO: 60epoch:train:3939-4296batch: iter_time=0.004, forward_time=0.100, loss_ctc=36.254, loss_att=18.129, acc=0.825, loss=23.566, backward_time=0.054, grad_norm=189.512, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.652e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:17:55,820 (trainer:732) INFO: 60epoch:train:4297-4654batch: iter_time=0.007, forward_time=0.101, loss_ctc=36.089, loss_att=18.031, acc=0.826, loss=23.448, backward_time=0.054, grad_norm=191.358, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.651e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:19:28,016 (trainer:732) INFO: 60epoch:train:4655-5012batch: iter_time=0.005, forward_time=0.099, loss_ctc=35.451, loss_att=17.724, acc=0.825, loss=23.042, backward_time=0.054, grad_norm=187.074, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.650e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:20:58,807 (trainer:732) INFO: 60epoch:train:5013-5370batch: iter_time=0.002, forward_time=0.099, loss_ctc=36.727, loss_att=18.306, acc=0.827, loss=23.833, backward_time=0.053, grad_norm=187.486, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.649e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:21:41,632 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:22:29,385 (trainer:732) INFO: 60epoch:train:5371-5728batch: iter_time=0.004, forward_time=0.097, loss_ctc=36.093, loss_att=17.955, acc=0.827, loss=23.397, backward_time=0.053, grad_norm=189.861, clip=100.000, loss_scale=612.392, optim_step_time=0.033, optim0_lr0=2.648e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:24:02,536 (trainer:732) INFO: 60epoch:train:5729-6086batch: iter_time=0.005, forward_time=0.101, loss_ctc=37.065, loss_att=18.490, acc=0.827, loss=24.062, backward_time=0.053, grad_norm=187.810, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.647e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:25:35,861 (trainer:732) INFO: 60epoch:train:6087-6444batch: iter_time=0.008, forward_time=0.099, loss_ctc=35.089, loss_att=17.525, acc=0.829, loss=22.795, backward_time=0.053, grad_norm=190.214, clip=100.000, loss_scale=512.000, optim_step_time=0.034, optim0_lr0=2.646e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:26:49,538 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:27:07,870 (trainer:732) INFO: 60epoch:train:6445-6802batch: iter_time=0.004, forward_time=0.100, loss_ctc=37.054, loss_att=18.580, acc=0.826, loss=24.122, backward_time=0.053, grad_norm=189.590, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.645e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:27:37,830 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:28:40,310 (trainer:732) INFO: 60epoch:train:6803-7160batch: iter_time=0.006, forward_time=0.099, loss_ctc=36.136, loss_att=18.056, acc=0.825, loss=23.480, backward_time=0.055, grad_norm=185.963, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.644e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:29:48,913 (trainer:338) INFO: 60epoch results: [train] iter_time=0.004, forward_time=0.099, loss_ctc=36.149, loss_att=18.031, acc=0.826, loss=23.466, backward_time=0.054, grad_norm=187.116, clip=100.000, loss_scale=532.243, optim_step_time=0.033, optim0_lr0=2.654e-05, train_time=0.257, time=30 minutes and 39.37 seconds, total_count=429660, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=17.056, cer_ctc=0.091, loss_att=8.865, acc=0.914, cer=0.055, wer=0.707, loss=11.322, time=14.4 seconds, total_count=3180, gpu_max_cached_mem_GB=28.453, [att_plot] time=53.52 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:29:52,403 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:29:52,428 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/50epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:29:52,428 (trainer:272) INFO: 61/100epoch started. Estimated time to finish: 21 hours, 23 minutes and 11.69 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:31:25,422 (trainer:732) INFO: 61epoch:train:1-358batch: iter_time=0.002, forward_time=0.101, loss_ctc=38.394, loss_att=19.253, acc=0.822, loss=24.995, backward_time=0.054, grad_norm=197.819, clip=100.000, loss_scale=512.000, optim_step_time=0.034, optim0_lr0=2.642e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:31:54,655 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:32:56,970 (trainer:732) INFO: 61epoch:train:359-716batch: iter_time=0.002, forward_time=0.100, loss_ctc=36.083, loss_att=18.008, acc=0.826, loss=23.430, backward_time=0.054, grad_norm=191.460, clip=100.000, loss_scale=645.378, optim_step_time=0.033, optim0_lr0=2.641e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:34:27,657 (trainer:732) INFO: 61epoch:train:717-1074batch: iter_time=0.002, forward_time=0.099, loss_ctc=35.109, loss_att=17.386, acc=0.831, loss=22.703, backward_time=0.053, grad_norm=184.475, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.640e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:35:20,248 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:35:32,081 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:35:58,735 (trainer:732) INFO: 61epoch:train:1075-1432batch: iter_time=8.649e-04, forward_time=0.099, loss_ctc=36.313, loss_att=18.056, acc=0.828, loss=23.533, backward_time=0.055, grad_norm=189.900, clip=100.000, loss_scale=403.720, optim_step_time=0.033, optim0_lr0=2.639e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:37:29,721 (trainer:732) INFO: 61epoch:train:1433-1790batch: iter_time=5.713e-04, forward_time=0.100, loss_ctc=37.085, loss_att=18.506, acc=0.827, loss=24.080, backward_time=0.054, grad_norm=192.159, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.638e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:37:46,439 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:39:00,334 (trainer:732) INFO: 61epoch:train:1791-2148batch: iter_time=0.003, forward_time=0.099, loss_ctc=34.546, loss_att=17.149, acc=0.830, loss=22.368, backward_time=0.053, grad_norm=183.557, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.637e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:40:31,483 (trainer:732) INFO: 61epoch:train:2149-2506batch: iter_time=0.002, forward_time=0.100, loss_ctc=36.051, loss_att=18.006, acc=0.829, loss=23.420, backward_time=0.053, grad_norm=185.814, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.636e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:42:03,364 (trainer:732) INFO: 61epoch:train:2507-2864batch: iter_time=0.004, forward_time=0.099, loss_ctc=36.396, loss_att=18.126, acc=0.828, loss=23.607, backward_time=0.054, grad_norm=187.539, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.635e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:43:35,865 (trainer:732) INFO: 61epoch:train:2865-3222batch: iter_time=0.004, forward_time=0.100, loss_ctc=36.803, loss_att=18.349, acc=0.826, loss=23.886, backward_time=0.055, grad_norm=190.579, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.634e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:45:08,558 (trainer:732) INFO: 61epoch:train:3223-3580batch: iter_time=0.008, forward_time=0.099, loss_ctc=35.703, loss_att=17.868, acc=0.824, loss=23.218, backward_time=0.053, grad_norm=188.685, clip=100.000, loss_scale=469.810, optim_step_time=0.033, optim0_lr0=2.633e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:46:40,883 (trainer:732) INFO: 61epoch:train:3581-3938batch: iter_time=0.005, forward_time=0.099, loss_ctc=35.326, loss_att=17.601, acc=0.830, loss=22.918, backward_time=0.055, grad_norm=189.089, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.631e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:48:14,450 (trainer:732) INFO: 61epoch:train:3939-4296batch: iter_time=0.005, forward_time=0.101, loss_ctc=35.432, loss_att=17.633, acc=0.830, loss=22.973, backward_time=0.054, grad_norm=193.756, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.630e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:49:45,817 (trainer:732) INFO: 61epoch:train:4297-4654batch: iter_time=0.004, forward_time=0.099, loss_ctc=36.951, loss_att=18.392, acc=0.826, loss=23.960, backward_time=0.054, grad_norm=188.654, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.629e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:51:18,017 (trainer:732) INFO: 61epoch:train:4655-5012batch: iter_time=0.004, forward_time=0.100, loss_ctc=36.642, loss_att=18.245, acc=0.828, loss=23.764, backward_time=0.053, grad_norm=187.925, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.628e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:52:19,309 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:52:49,292 (trainer:732) INFO: 61epoch:train:5013-5370batch: iter_time=0.005, forward_time=0.098, loss_ctc=35.786, loss_att=17.828, acc=0.826, loss=23.215, backward_time=0.053, grad_norm=182.639, clip=100.000, loss_scale=425.950, optim_step_time=0.033, optim0_lr0=2.627e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:54:21,310 (trainer:732) INFO: 61epoch:train:5371-5728batch: iter_time=0.005, forward_time=0.098, loss_ctc=35.909, loss_att=17.840, acc=0.826, loss=23.261, backward_time=0.053, grad_norm=183.764, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.626e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:54:33,973 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:55:54,818 (trainer:732) INFO: 61epoch:train:5729-6086batch: iter_time=0.009, forward_time=0.099, loss_ctc=35.223, loss_att=17.574, acc=0.830, loss=22.869, backward_time=0.054, grad_norm=186.277, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.625e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:57:28,971 (trainer:732) INFO: 61epoch:train:6087-6444batch: iter_time=0.010, forward_time=0.099, loss_ctc=34.293, loss_att=17.012, acc=0.832, loss=22.196, backward_time=0.054, grad_norm=185.307, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.624e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 01:59:00,385 (trainer:732) INFO: 61epoch:train:6445-6802batch: iter_time=0.006, forward_time=0.097, loss_ctc=35.809, loss_att=17.900, acc=0.825, loss=23.273, backward_time=0.055, grad_norm=185.991, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.623e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:00:34,436 (trainer:732) INFO: 61epoch:train:6803-7160batch: iter_time=0.009, forward_time=0.100, loss_ctc=34.081, loss_att=17.018, acc=0.829, loss=22.137, backward_time=0.053, grad_norm=185.547, clip=100.000, loss_scale=256.000, optim_step_time=0.034, optim0_lr0=2.622e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:01:43,451 (trainer:338) INFO: 61epoch results: [train] iter_time=0.004, forward_time=0.099, loss_ctc=35.880, loss_att=17.879, acc=0.828, loss=23.279, backward_time=0.054, grad_norm=188.045, clip=100.000, loss_scale=378.778, optim_step_time=0.033, optim0_lr0=2.632e-05, train_time=0.257, time=30 minutes and 42.68 seconds, total_count=436821, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=16.887, cer_ctc=0.090, loss_att=8.762, acc=0.914, cer=0.054, wer=0.707, loss=11.200, time=14.65 seconds, total_count=3233, gpu_max_cached_mem_GB=28.453, [att_plot] time=53.68 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:01:47,029 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:01:47,056 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/51epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:01:47,056 (trainer:272) INFO: 62/100epoch started. Estimated time to finish: 20 hours, 51 minutes and 0.18 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:03:18,083 (trainer:732) INFO: 62epoch:train:1-358batch: iter_time=0.002, forward_time=0.099, loss_ctc=37.289, loss_att=18.463, acc=0.827, loss=24.111, backward_time=0.053, grad_norm=190.070, clip=100.000, loss_scale=448.358, optim_step_time=0.033, optim0_lr0=2.621e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:04:49,430 (trainer:732) INFO: 62epoch:train:359-716batch: iter_time=0.001, forward_time=0.100, loss_ctc=35.684, loss_att=17.787, acc=0.829, loss=23.156, backward_time=0.053, grad_norm=187.602, clip=100.000, loss_scale=512.000, optim_step_time=0.034, optim0_lr0=2.620e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:05:52,745 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:06:20,971 (trainer:732) INFO: 62epoch:train:717-1074batch: iter_time=0.002, forward_time=0.100, loss_ctc=35.397, loss_att=17.607, acc=0.831, loss=22.944, backward_time=0.053, grad_norm=186.542, clip=100.000, loss_scale=512.000, optim_step_time=0.034, optim0_lr0=2.619e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:07:51,491 (trainer:732) INFO: 62epoch:train:1075-1432batch: iter_time=0.001, forward_time=0.099, loss_ctc=36.030, loss_att=18.022, acc=0.827, loss=23.424, backward_time=0.053, grad_norm=189.406, clip=100.000, loss_scale=512.000, optim_step_time=0.034, optim0_lr0=2.617e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:09:14,808 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:09:22,478 (trainer:732) INFO: 62epoch:train:1433-1790batch: iter_time=0.003, forward_time=0.100, loss_ctc=34.292, loss_att=17.073, acc=0.832, loss=22.239, backward_time=0.053, grad_norm=184.392, clip=100.000, loss_scale=491.204, optim_step_time=0.033, optim0_lr0=2.616e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:10:52,687 (trainer:732) INFO: 62epoch:train:1791-2148batch: iter_time=0.001, forward_time=0.099, loss_ctc=33.987, loss_att=16.961, acc=0.829, loss=22.068, backward_time=0.053, grad_norm=185.878, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.615e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:12:25,723 (trainer:732) INFO: 62epoch:train:2149-2506batch: iter_time=0.002, forward_time=0.102, loss_ctc=36.247, loss_att=18.062, acc=0.830, loss=23.517, backward_time=0.053, grad_norm=184.199, clip=100.000, loss_scale=256.000, optim_step_time=0.034, optim0_lr0=2.614e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:13:57,278 (trainer:732) INFO: 62epoch:train:2507-2864batch: iter_time=0.003, forward_time=0.099, loss_ctc=36.949, loss_att=18.422, acc=0.828, loss=23.980, backward_time=0.053, grad_norm=188.336, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.613e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:15:29,256 (trainer:732) INFO: 62epoch:train:2865-3222batch: iter_time=0.006, forward_time=0.099, loss_ctc=34.751, loss_att=17.248, acc=0.832, loss=22.499, backward_time=0.053, grad_norm=181.149, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.612e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:16:11,318 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:17:00,875 (trainer:732) INFO: 62epoch:train:3223-3580batch: iter_time=0.003, forward_time=0.099, loss_ctc=35.985, loss_att=17.963, acc=0.828, loss=23.370, backward_time=0.053, grad_norm=193.929, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.611e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:18:33,087 (trainer:732) INFO: 62epoch:train:3581-3938batch: iter_time=0.005, forward_time=0.099, loss_ctc=35.050, loss_att=17.461, acc=0.830, loss=22.738, backward_time=0.054, grad_norm=186.009, clip=100.000, loss_scale=382.570, optim_step_time=0.033, optim0_lr0=2.610e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:20:06,204 (trainer:732) INFO: 62epoch:train:3939-4296batch: iter_time=0.006, forward_time=0.100, loss_ctc=35.677, loss_att=17.849, acc=0.829, loss=23.197, backward_time=0.053, grad_norm=182.492, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.609e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:21:37,017 (trainer:732) INFO: 62epoch:train:4297-4654batch: iter_time=0.005, forward_time=0.097, loss_ctc=35.485, loss_att=17.611, acc=0.827, loss=22.973, backward_time=0.053, grad_norm=188.552, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.608e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:23:08,623 (trainer:732) INFO: 62epoch:train:4655-5012batch: iter_time=0.005, forward_time=0.099, loss_ctc=36.398, loss_att=18.108, acc=0.829, loss=23.595, backward_time=0.053, grad_norm=187.494, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.607e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:24:40,026 (trainer:732) INFO: 62epoch:train:5013-5370batch: iter_time=0.004, forward_time=0.099, loss_ctc=36.397, loss_att=18.165, acc=0.826, loss=23.634, backward_time=0.054, grad_norm=189.494, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.606e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:26:12,179 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:26:12,253 (trainer:732) INFO: 62epoch:train:5371-5728batch: iter_time=0.006, forward_time=0.099, loss_ctc=35.500, loss_att=17.718, acc=0.827, loss=23.053, backward_time=0.054, grad_norm=191.776, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.605e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:26:44,893 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:27:45,773 (trainer:732) INFO: 62epoch:train:5729-6086batch: iter_time=0.008, forward_time=0.100, loss_ctc=36.208, loss_att=18.083, acc=0.827, loss=23.520, backward_time=0.053, grad_norm=189.222, clip=100.000, loss_scale=641.076, optim_step_time=0.033, optim0_lr0=2.604e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:29:19,720 (trainer:732) INFO: 62epoch:train:6087-6444batch: iter_time=0.007, forward_time=0.100, loss_ctc=35.876, loss_att=17.910, acc=0.829, loss=23.300, backward_time=0.053, grad_norm=190.799, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.603e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:30:54,132 (trainer:732) INFO: 62epoch:train:6445-6802batch: iter_time=0.011, forward_time=0.099, loss_ctc=34.925, loss_att=17.334, acc=0.831, loss=22.611, backward_time=0.053, grad_norm=187.828, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.602e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:32:27,583 (trainer:732) INFO: 62epoch:train:6803-7160batch: iter_time=0.009, forward_time=0.099, loss_ctc=35.345, loss_att=17.683, acc=0.828, loss=22.981, backward_time=0.053, grad_norm=183.081, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.601e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:33:36,161 (trainer:338) INFO: 62epoch results: [train] iter_time=0.005, forward_time=0.099, loss_ctc=35.661, loss_att=17.770, acc=0.829, loss=23.137, backward_time=0.053, grad_norm=187.416, clip=100.000, loss_scale=443.736, optim_step_time=0.033, optim0_lr0=2.611e-05, train_time=0.257, time=30 minutes and 41.21 seconds, total_count=443982, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=16.912, cer_ctc=0.090, loss_att=8.733, acc=0.915, cer=0.054, wer=0.708, loss=11.187, time=14.47 seconds, total_count=3286, gpu_max_cached_mem_GB=28.453, [att_plot] time=53.42 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:33:39,922 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:33:39,948 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/53epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:33:39,948 (trainer:272) INFO: 63/100epoch started. Estimated time to finish: 20 hours, 18 minutes and 48.13 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:35:10,846 (trainer:732) INFO: 63epoch:train:1-358batch: iter_time=0.003, forward_time=0.098, loss_ctc=36.214, loss_att=18.098, acc=0.828, loss=23.533, backward_time=0.053, grad_norm=194.600, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.599e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:36:41,328 (trainer:732) INFO: 63epoch:train:359-716batch: iter_time=8.620e-04, forward_time=0.099, loss_ctc=37.550, loss_att=18.643, acc=0.827, loss=24.315, backward_time=0.053, grad_norm=192.184, clip=100.000, loss_scale=547.754, optim_step_time=0.034, optim0_lr0=2.598e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:36:50,527 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:38:12,095 (trainer:732) INFO: 63epoch:train:717-1074batch: iter_time=0.002, forward_time=0.099, loss_ctc=36.162, loss_att=17.978, acc=0.831, loss=23.434, backward_time=0.052, grad_norm=188.882, clip=100.000, loss_scale=563.630, optim_step_time=0.033, optim0_lr0=2.597e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:39:43,762 (trainer:732) INFO: 63epoch:train:1075-1432batch: iter_time=0.002, forward_time=0.100, loss_ctc=34.039, loss_att=16.917, acc=0.834, loss=22.054, backward_time=0.054, grad_norm=183.647, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.596e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:41:15,248 (trainer:732) INFO: 63epoch:train:1433-1790batch: iter_time=0.003, forward_time=0.100, loss_ctc=33.966, loss_att=16.898, acc=0.831, loss=22.018, backward_time=0.053, grad_norm=181.493, clip=100.000, loss_scale=512.000, optim_step_time=0.034, optim0_lr0=2.595e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:42:06,456 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:42:46,797 (trainer:732) INFO: 63epoch:train:1791-2148batch: iter_time=0.003, forward_time=0.099, loss_ctc=35.433, loss_att=17.580, acc=0.831, loss=22.935, backward_time=0.053, grad_norm=190.324, clip=100.000, loss_scale=512.000, optim_step_time=0.034, optim0_lr0=2.594e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:44:17,567 (trainer:732) INFO: 63epoch:train:2149-2506batch: iter_time=0.006, forward_time=0.097, loss_ctc=33.151, loss_att=16.465, acc=0.833, loss=21.471, backward_time=0.053, grad_norm=184.735, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.593e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:45:29,533 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:45:50,722 (trainer:732) INFO: 63epoch:train:2507-2864batch: iter_time=0.007, forward_time=0.100, loss_ctc=35.497, loss_att=17.662, acc=0.830, loss=23.013, backward_time=0.053, grad_norm=185.281, clip=100.000, loss_scale=552.157, optim_step_time=0.033, optim0_lr0=2.592e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:46:27,003 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:47:22,002 (trainer:732) INFO: 63epoch:train:2865-3222batch: iter_time=0.003, forward_time=0.099, loss_ctc=35.147, loss_att=17.533, acc=0.832, loss=22.817, backward_time=0.053, grad_norm=189.589, clip=100.000, loss_scale=512.000, optim_step_time=0.032, optim0_lr0=2.591e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:48:53,037 (trainer:732) INFO: 63epoch:train:3223-3580batch: iter_time=0.003, forward_time=0.099, loss_ctc=36.395, loss_att=18.143, acc=0.830, loss=23.619, backward_time=0.054, grad_norm=197.952, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.590e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:50:27,011 (trainer:732) INFO: 63epoch:train:3581-3938batch: iter_time=0.006, forward_time=0.101, loss_ctc=36.809, loss_att=18.382, acc=0.828, loss=23.910, backward_time=0.056, grad_norm=190.378, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.589e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:51:59,333 (trainer:732) INFO: 63epoch:train:3939-4296batch: iter_time=0.004, forward_time=0.100, loss_ctc=36.070, loss_att=17.959, acc=0.828, loss=23.393, backward_time=0.053, grad_norm=193.471, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.588e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:53:31,254 (trainer:732) INFO: 63epoch:train:4297-4654batch: iter_time=0.004, forward_time=0.099, loss_ctc=35.301, loss_att=17.552, acc=0.831, loss=22.877, backward_time=0.053, grad_norm=190.114, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.587e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:54:47,809 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:55:02,633 (trainer:732) INFO: 63epoch:train:4655-5012batch: iter_time=0.003, forward_time=0.099, loss_ctc=37.045, loss_att=18.542, acc=0.828, loss=24.093, backward_time=0.053, grad_norm=189.602, clip=100.000, loss_scale=757.244, optim_step_time=0.033, optim0_lr0=2.586e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:56:34,097 (trainer:732) INFO: 63epoch:train:5013-5370batch: iter_time=0.005, forward_time=0.098, loss_ctc=34.930, loss_att=17.361, acc=0.830, loss=22.632, backward_time=0.053, grad_norm=179.544, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.585e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:58:07,040 (trainer:732) INFO: 63epoch:train:5371-5728batch: iter_time=0.003, forward_time=0.101, loss_ctc=36.628, loss_att=18.287, acc=0.828, loss=23.789, backward_time=0.054, grad_norm=193.364, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.584e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 02:59:41,001 (trainer:732) INFO: 63epoch:train:5729-6086batch: iter_time=0.013, forward_time=0.098, loss_ctc=33.699, loss_att=16.749, acc=0.830, loss=21.834, backward_time=0.055, grad_norm=178.519, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.583e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:01:12,419 (trainer:732) INFO: 63epoch:train:6087-6444batch: iter_time=0.005, forward_time=0.098, loss_ctc=35.531, loss_att=17.624, acc=0.831, loss=22.996, backward_time=0.053, grad_norm=190.717, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.582e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:02:45,417 (trainer:732) INFO: 63epoch:train:6445-6802batch: iter_time=0.009, forward_time=0.099, loss_ctc=34.374, loss_att=17.062, acc=0.832, loss=22.256, backward_time=0.053, grad_norm=185.625, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.581e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:03:32,128 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:04:19,240 (trainer:732) INFO: 63epoch:train:6803-7160batch: iter_time=0.009, forward_time=0.100, loss_ctc=35.580, loss_att=17.726, acc=0.828, loss=23.082, backward_time=0.053, grad_norm=180.339, clip=100.000, loss_scale=806.615, optim_step_time=0.033, optim0_lr0=2.580e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:05:29,087 (trainer:338) INFO: 63epoch results: [train] iter_time=0.005, forward_time=0.099, loss_ctc=35.446, loss_att=17.642, acc=0.830, loss=22.983, backward_time=0.053, grad_norm=188.014, clip=100.000, loss_scale=545.404, optim_step_time=0.033, optim0_lr0=2.590e-05, train_time=0.257, time=30 minutes and 39.93 seconds, total_count=451143, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=16.632, cer_ctc=0.089, loss_att=8.682, acc=0.916, cer=0.053, wer=0.704, loss=11.067, time=15.6 seconds, total_count=3339, gpu_max_cached_mem_GB=28.453, [att_plot] time=53.6 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:05:32,856 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:05:32,883 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/52epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:05:32,883 (trainer:272) INFO: 64/100epoch started. Estimated time to finish: 19 hours, 46 minutes and 36.74 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:05:38,986 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:06:54,927 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:07:05,383 (trainer:732) INFO: 64epoch:train:1-358batch: iter_time=0.002, forward_time=0.101, loss_ctc=37.793, loss_att=18.800, acc=0.829, loss=24.498, backward_time=0.053, grad_norm=196.947, clip=100.000, loss_scale=512.719, optim_step_time=0.033, optim0_lr0=2.579e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:08:35,557 (trainer:732) INFO: 64epoch:train:359-716batch: iter_time=0.001, forward_time=0.099, loss_ctc=34.498, loss_att=17.168, acc=0.831, loss=22.367, backward_time=0.053, grad_norm=183.614, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.578e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:10:08,355 (trainer:732) INFO: 64epoch:train:717-1074batch: iter_time=0.001, forward_time=0.102, loss_ctc=36.308, loss_att=18.178, acc=0.828, loss=23.617, backward_time=0.053, grad_norm=195.791, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.577e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:11:39,802 (trainer:732) INFO: 64epoch:train:1075-1432batch: iter_time=0.001, forward_time=0.100, loss_ctc=35.081, loss_att=17.424, acc=0.832, loss=22.721, backward_time=0.053, grad_norm=187.969, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.576e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:11:58,164 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:13:10,487 (trainer:732) INFO: 64epoch:train:1433-1790batch: iter_time=0.003, forward_time=0.098, loss_ctc=33.937, loss_att=16.821, acc=0.834, loss=21.956, backward_time=0.053, grad_norm=181.593, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.575e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:14:41,551 (trainer:732) INFO: 64epoch:train:1791-2148batch: iter_time=0.002, forward_time=0.099, loss_ctc=34.755, loss_att=17.292, acc=0.832, loss=22.530, backward_time=0.053, grad_norm=187.345, clip=100.000, loss_scale=256.000, optim_step_time=0.034, optim0_lr0=2.574e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:16:11,116 (trainer:732) INFO: 64epoch:train:2149-2506batch: iter_time=0.002, forward_time=0.097, loss_ctc=35.386, loss_att=17.551, acc=0.833, loss=22.901, backward_time=0.053, grad_norm=187.109, clip=100.000, loss_scale=391.151, optim_step_time=0.033, optim0_lr0=2.573e-05, train_time=0.250 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:17:41,607 (trainer:732) INFO: 64epoch:train:2507-2864batch: iter_time=0.002, forward_time=0.098, loss_ctc=35.195, loss_att=17.489, acc=0.832, loss=22.801, backward_time=0.053, grad_norm=186.522, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.572e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:18:07,875 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:19:14,466 (trainer:732) INFO: 64epoch:train:2865-3222batch: iter_time=0.005, forward_time=0.100, loss_ctc=35.278, loss_att=17.542, acc=0.832, loss=22.863, backward_time=0.054, grad_norm=188.025, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.571e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:20:46,571 (trainer:732) INFO: 64epoch:train:3223-3580batch: iter_time=0.003, forward_time=0.100, loss_ctc=37.573, loss_att=18.675, acc=0.830, loss=24.344, backward_time=0.053, grad_norm=193.594, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.570e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:22:19,781 (trainer:732) INFO: 64epoch:train:3581-3938batch: iter_time=0.004, forward_time=0.101, loss_ctc=36.776, loss_att=18.305, acc=0.830, loss=23.846, backward_time=0.055, grad_norm=194.446, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.569e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:23:07,507 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:23:51,944 (trainer:732) INFO: 64epoch:train:3939-4296batch: iter_time=0.003, forward_time=0.099, loss_ctc=35.167, loss_att=17.484, acc=0.830, loss=22.789, backward_time=0.057, grad_norm=185.799, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.568e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:24:02,858 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:25:24,673 (trainer:732) INFO: 64epoch:train:4297-4654batch: iter_time=0.010, forward_time=0.098, loss_ctc=33.203, loss_att=16.498, acc=0.834, loss=21.510, backward_time=0.054, grad_norm=186.152, clip=100.000, loss_scale=537.815, optim_step_time=0.033, optim0_lr0=2.567e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:26:57,765 (trainer:732) INFO: 64epoch:train:4655-5012batch: iter_time=0.009, forward_time=0.099, loss_ctc=32.518, loss_att=16.167, acc=0.836, loss=21.073, backward_time=0.053, grad_norm=184.737, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.566e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:28:30,124 (trainer:732) INFO: 64epoch:train:5013-5370batch: iter_time=0.004, forward_time=0.100, loss_ctc=36.789, loss_att=18.333, acc=0.828, loss=23.870, backward_time=0.053, grad_norm=191.337, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.565e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:29:30,915 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:30:02,369 (trainer:732) INFO: 64epoch:train:5371-5728batch: iter_time=0.006, forward_time=0.099, loss_ctc=35.344, loss_att=17.675, acc=0.828, loss=22.976, backward_time=0.053, grad_norm=187.310, clip=100.000, loss_scale=425.950, optim_step_time=0.033, optim0_lr0=2.564e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:31:34,775 (trainer:732) INFO: 64epoch:train:5729-6086batch: iter_time=0.008, forward_time=0.098, loss_ctc=34.143, loss_att=16.955, acc=0.834, loss=22.111, backward_time=0.053, grad_norm=189.330, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.563e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:33:07,261 (trainer:732) INFO: 64epoch:train:6087-6444batch: iter_time=0.007, forward_time=0.099, loss_ctc=34.684, loss_att=17.330, acc=0.829, loss=22.536, backward_time=0.053, grad_norm=189.281, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.562e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:34:40,683 (trainer:732) INFO: 64epoch:train:6445-6802batch: iter_time=0.008, forward_time=0.100, loss_ctc=35.402, loss_att=17.665, acc=0.832, loss=22.986, backward_time=0.053, grad_norm=181.147, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.561e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:36:13,141 (trainer:732) INFO: 64epoch:train:6803-7160batch: iter_time=0.007, forward_time=0.099, loss_ctc=34.956, loss_att=17.367, acc=0.832, loss=22.644, backward_time=0.053, grad_norm=184.569, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.560e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:37:21,968 (trainer:338) INFO: 64epoch results: [train] iter_time=0.005, forward_time=0.099, loss_ctc=35.205, loss_att=17.519, acc=0.831, loss=22.825, backward_time=0.053, grad_norm=188.116, clip=100.000, loss_scale=387.702, optim_step_time=0.033, optim0_lr0=2.569e-05, train_time=0.257, time=30 minutes and 40.95 seconds, total_count=458304, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=16.576, cer_ctc=0.088, loss_att=8.611, acc=0.916, cer=0.053, wer=0.703, loss=11.001, time=14.46 seconds, total_count=3392, gpu_max_cached_mem_GB=28.453, [att_plot] time=53.67 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:37:25,724 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:37:25,751 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/54epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:37:25,751 (trainer:272) INFO: 65/100epoch started. Estimated time to finish: 19 hours, 14 minutes and 25.9 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:38:57,001 (trainer:732) INFO: 65epoch:train:1-358batch: iter_time=0.002, forward_time=0.100, loss_ctc=34.934, loss_att=17.318, acc=0.832, loss=22.603, backward_time=0.054, grad_norm=186.509, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.559e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:40:28,669 (trainer:732) INFO: 65epoch:train:359-716batch: iter_time=0.005, forward_time=0.099, loss_ctc=32.659, loss_att=16.198, acc=0.836, loss=21.137, backward_time=0.054, grad_norm=183.402, clip=100.000, loss_scale=448.358, optim_step_time=0.033, optim0_lr0=2.558e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:41:59,839 (trainer:732) INFO: 65epoch:train:717-1074batch: iter_time=0.001, forward_time=0.099, loss_ctc=34.998, loss_att=17.442, acc=0.831, loss=22.709, backward_time=0.055, grad_norm=188.428, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.557e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:43:31,488 (trainer:732) INFO: 65epoch:train:1075-1432batch: iter_time=0.002, forward_time=0.100, loss_ctc=35.107, loss_att=17.470, acc=0.832, loss=22.761, backward_time=0.054, grad_norm=189.042, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.556e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:43:41,801 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:45:03,354 (trainer:732) INFO: 65epoch:train:1433-1790batch: iter_time=7.576e-04, forward_time=0.101, loss_ctc=36.529, loss_att=18.172, acc=0.832, loss=23.679, backward_time=0.054, grad_norm=186.959, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.555e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:46:04,713 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:46:34,301 (trainer:732) INFO: 65epoch:train:1791-2148batch: iter_time=0.002, forward_time=0.099, loss_ctc=35.459, loss_att=17.607, acc=0.832, loss=22.963, backward_time=0.053, grad_norm=186.450, clip=100.000, loss_scale=428.101, optim_step_time=0.033, optim0_lr0=2.554e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:48:06,535 (trainer:732) INFO: 65epoch:train:2149-2506batch: iter_time=0.003, forward_time=0.100, loss_ctc=35.858, loss_att=17.905, acc=0.830, loss=23.291, backward_time=0.053, grad_norm=191.412, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.553e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:49:37,058 (trainer:732) INFO: 65epoch:train:2507-2864batch: iter_time=0.002, forward_time=0.099, loss_ctc=34.338, loss_att=17.046, acc=0.833, loss=22.233, backward_time=0.054, grad_norm=185.059, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.552e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:51:07,947 (trainer:732) INFO: 65epoch:train:2865-3222batch: iter_time=0.003, forward_time=0.099, loss_ctc=36.042, loss_att=17.857, acc=0.832, loss=23.312, backward_time=0.053, grad_norm=189.175, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.551e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:52:20,727 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:52:40,591 (trainer:732) INFO: 65epoch:train:3223-3580batch: iter_time=0.004, forward_time=0.100, loss_ctc=35.721, loss_att=17.679, acc=0.832, loss=23.091, backward_time=0.054, grad_norm=189.741, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.550e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:54:13,619 (trainer:732) INFO: 65epoch:train:3581-3938batch: iter_time=0.006, forward_time=0.100, loss_ctc=33.878, loss_att=16.893, acc=0.833, loss=21.989, backward_time=0.054, grad_norm=187.961, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.549e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:55:45,407 (trainer:732) INFO: 65epoch:train:3939-4296batch: iter_time=0.009, forward_time=0.097, loss_ctc=34.224, loss_att=16.938, acc=0.833, loss=22.124, backward_time=0.053, grad_norm=181.906, clip=100.000, loss_scale=445.497, optim_step_time=0.033, optim0_lr0=2.548e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:57:18,197 (trainer:732) INFO: 65epoch:train:4297-4654batch: iter_time=0.005, forward_time=0.100, loss_ctc=35.296, loss_att=17.521, acc=0.834, loss=22.853, backward_time=0.053, grad_norm=185.018, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.547e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 03:58:51,435 (trainer:732) INFO: 65epoch:train:4655-5012batch: iter_time=0.007, forward_time=0.100, loss_ctc=34.901, loss_att=17.298, acc=0.834, loss=22.579, backward_time=0.055, grad_norm=186.016, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.546e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:00:12,406 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:00:22,720 (trainer:732) INFO: 65epoch:train:5013-5370batch: iter_time=0.005, forward_time=0.098, loss_ctc=34.971, loss_att=17.359, acc=0.836, loss=22.643, backward_time=0.053, grad_norm=182.840, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.545e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:01:56,103 (trainer:732) INFO: 65epoch:train:5371-5728batch: iter_time=0.006, forward_time=0.100, loss_ctc=36.671, loss_att=18.275, acc=0.830, loss=23.794, backward_time=0.054, grad_norm=191.688, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.544e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:03:13,720 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:03:26,901 (trainer:732) INFO: 65epoch:train:5729-6086batch: iter_time=0.006, forward_time=0.097, loss_ctc=34.142, loss_att=16.965, acc=0.834, loss=22.118, backward_time=0.053, grad_norm=187.984, clip=100.000, loss_scale=514.868, optim_step_time=0.033, optim0_lr0=2.543e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:04:59,907 (trainer:732) INFO: 65epoch:train:6087-6444batch: iter_time=0.005, forward_time=0.100, loss_ctc=35.382, loss_att=17.577, acc=0.834, loss=22.918, backward_time=0.054, grad_norm=182.967, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.542e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:06:32,123 (trainer:732) INFO: 65epoch:train:6445-6802batch: iter_time=0.006, forward_time=0.098, loss_ctc=35.290, loss_att=17.582, acc=0.831, loss=22.895, backward_time=0.055, grad_norm=189.624, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.541e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:08:04,962 (trainer:732) INFO: 65epoch:train:6803-7160batch: iter_time=0.009, forward_time=0.099, loss_ctc=33.214, loss_att=16.485, acc=0.836, loss=21.504, backward_time=0.053, grad_norm=185.887, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.540e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:09:13,421 (trainer:338) INFO: 65epoch results: [train] iter_time=0.004, forward_time=0.099, loss_ctc=34.968, loss_att=17.373, acc=0.833, loss=22.652, backward_time=0.054, grad_norm=186.910, clip=100.000, loss_scale=424.640, optim_step_time=0.033, optim0_lr0=2.549e-05, train_time=0.257, time=30 minutes and 39.88 seconds, total_count=465465, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=16.207, cer_ctc=0.086, loss_att=8.494, acc=0.918, cer=0.052, wer=0.689, loss=10.808, time=14.19 seconds, total_count=3445, gpu_max_cached_mem_GB=28.453, [att_plot] time=53.6 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:09:16,892 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:09:16,918 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/55epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:09:16,918 (trainer:272) INFO: 66/100epoch started. Estimated time to finish: 18 hours, 42 minutes and 14.69 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:09:21,415 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:10:47,807 (trainer:732) INFO: 66epoch:train:1-358batch: iter_time=0.004, forward_time=0.098, loss_ctc=34.417, loss_att=17.119, acc=0.831, loss=22.308, backward_time=0.052, grad_norm=186.039, clip=100.000, loss_scale=266.039, optim_step_time=0.033, optim0_lr0=2.539e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:12:18,454 (trainer:732) INFO: 66epoch:train:359-716batch: iter_time=0.003, forward_time=0.098, loss_ctc=34.177, loss_att=16.951, acc=0.836, loss=22.118, backward_time=0.053, grad_norm=182.756, clip=100.000, loss_scale=256.000, optim_step_time=0.032, optim0_lr0=2.538e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:13:15,589 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:13:48,472 (trainer:732) INFO: 66epoch:train:717-1074batch: iter_time=0.001, forward_time=0.098, loss_ctc=35.117, loss_att=17.477, acc=0.833, loss=22.769, backward_time=0.053, grad_norm=187.692, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.537e-05, train_time=0.251 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:15:19,917 (trainer:732) INFO: 66epoch:train:1075-1432batch: iter_time=0.002, forward_time=0.100, loss_ctc=35.201, loss_att=17.543, acc=0.832, loss=22.840, backward_time=0.053, grad_norm=187.208, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.536e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:16:50,072 (trainer:732) INFO: 66epoch:train:1433-1790batch: iter_time=0.002, forward_time=0.098, loss_ctc=34.557, loss_att=17.122, acc=0.834, loss=22.352, backward_time=0.053, grad_norm=192.287, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.535e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:18:21,949 (trainer:732) INFO: 66epoch:train:1791-2148batch: iter_time=0.001, forward_time=0.101, loss_ctc=36.133, loss_att=18.007, acc=0.830, loss=23.445, backward_time=0.053, grad_norm=188.653, clip=100.000, loss_scale=351.106, optim_step_time=0.033, optim0_lr0=2.534e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:19:52,982 (trainer:732) INFO: 66epoch:train:2149-2506batch: iter_time=0.002, forward_time=0.100, loss_ctc=34.568, loss_att=17.209, acc=0.831, loss=22.417, backward_time=0.053, grad_norm=187.143, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.533e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:21:24,180 (trainer:732) INFO: 66epoch:train:2507-2864batch: iter_time=0.003, forward_time=0.099, loss_ctc=33.882, loss_att=16.804, acc=0.836, loss=21.927, backward_time=0.054, grad_norm=181.065, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.532e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:22:54,792 (trainer:732) INFO: 66epoch:train:2865-3222batch: iter_time=0.003, forward_time=0.098, loss_ctc=34.473, loss_att=17.102, acc=0.835, loss=22.313, backward_time=0.053, grad_norm=186.773, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.531e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:24:27,708 (trainer:732) INFO: 66epoch:train:3223-3580batch: iter_time=0.006, forward_time=0.100, loss_ctc=33.881, loss_att=16.791, acc=0.836, loss=21.918, backward_time=0.054, grad_norm=182.198, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.530e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:25:59,200 (trainer:732) INFO: 66epoch:train:3581-3938batch: iter_time=0.006, forward_time=0.098, loss_ctc=32.814, loss_att=16.264, acc=0.836, loss=21.229, backward_time=0.054, grad_norm=182.846, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.529e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:26:27,490 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:27:30,955 (trainer:732) INFO: 66epoch:train:3939-4296batch: iter_time=0.007, forward_time=0.098, loss_ctc=34.447, loss_att=17.179, acc=0.833, loss=22.359, backward_time=0.054, grad_norm=191.417, clip=100.000, loss_scale=556.459, optim_step_time=0.033, optim0_lr0=2.528e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:28:40,604 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:29:03,967 (trainer:732) INFO: 66epoch:train:4297-4654batch: iter_time=0.004, forward_time=0.101, loss_ctc=36.199, loss_att=17.989, acc=0.833, loss=23.452, backward_time=0.055, grad_norm=200.576, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.527e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:30:35,656 (trainer:732) INFO: 66epoch:train:4655-5012batch: iter_time=0.003, forward_time=0.100, loss_ctc=34.839, loss_att=17.344, acc=0.832, loss=22.593, backward_time=0.054, grad_norm=185.078, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.526e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:32:07,781 (trainer:732) INFO: 66epoch:train:5013-5370batch: iter_time=0.005, forward_time=0.099, loss_ctc=34.844, loss_att=17.234, acc=0.835, loss=22.517, backward_time=0.053, grad_norm=182.573, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.525e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:33:41,093 (trainer:732) INFO: 66epoch:train:5371-5728batch: iter_time=0.006, forward_time=0.100, loss_ctc=34.378, loss_att=17.058, acc=0.835, loss=22.254, backward_time=0.053, grad_norm=188.538, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.524e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:34:44,536 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:35:13,718 (trainer:732) INFO: 66epoch:train:5729-6086batch: iter_time=0.004, forward_time=0.100, loss_ctc=34.803, loss_att=17.229, acc=0.835, loss=22.501, backward_time=0.055, grad_norm=186.736, clip=100.000, loss_scale=567.777, optim_step_time=0.033, optim0_lr0=2.523e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:35:51,197 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:36:47,058 (trainer:732) INFO: 66epoch:train:6087-6444batch: iter_time=0.005, forward_time=0.100, loss_ctc=36.417, loss_att=18.148, acc=0.831, loss=23.629, backward_time=0.053, grad_norm=189.720, clip=100.000, loss_scale=717.087, optim_step_time=0.033, optim0_lr0=2.522e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:38:20,447 (trainer:732) INFO: 66epoch:train:6445-6802batch: iter_time=0.007, forward_time=0.099, loss_ctc=35.442, loss_att=17.633, acc=0.834, loss=22.975, backward_time=0.055, grad_norm=194.443, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.521e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:39:54,552 (trainer:732) INFO: 66epoch:train:6803-7160batch: iter_time=0.008, forward_time=0.100, loss_ctc=35.927, loss_att=17.843, acc=0.832, loss=23.268, backward_time=0.053, grad_norm=186.757, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.520e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:41:03,288 (trainer:338) INFO: 66epoch results: [train] iter_time=0.004, forward_time=0.099, loss_ctc=34.815, loss_att=17.296, acc=0.834, loss=22.552, backward_time=0.053, grad_norm=187.532, clip=100.000, loss_scale=455.707, optim_step_time=0.033, optim0_lr0=2.530e-05, train_time=0.256, time=30 minutes and 38.35 seconds, total_count=472626, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=16.246, cer_ctc=0.086, loss_att=8.505, acc=0.918, cer=0.052, wer=0.694, loss=10.827, time=14.47 seconds, total_count=3498, gpu_max_cached_mem_GB=28.453, [att_plot] time=53.54 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:41:07,035 (trainer:384) INFO: There are no improvements in this epoch [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:41:07,061 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/56epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:41:07,062 (trainer:272) INFO: 67/100epoch started. Estimated time to finish: 18 hours, 10 minutes and 3.56 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:42:38,640 (trainer:732) INFO: 67epoch:train:1-358batch: iter_time=0.003, forward_time=0.099, loss_ctc=36.701, loss_att=18.223, acc=0.829, loss=23.766, backward_time=0.053, grad_norm=196.139, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.519e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:44:09,443 (trainer:732) INFO: 67epoch:train:359-716batch: iter_time=0.002, forward_time=0.099, loss_ctc=34.381, loss_att=17.024, acc=0.835, loss=22.232, backward_time=0.053, grad_norm=181.867, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.519e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:45:39,990 (trainer:732) INFO: 67epoch:train:717-1074batch: iter_time=0.002, forward_time=0.099, loss_ctc=34.711, loss_att=17.249, acc=0.835, loss=22.488, backward_time=0.053, grad_norm=180.810, clip=100.000, loss_scale=519.151, optim_step_time=0.033, optim0_lr0=2.518e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:45:43,431 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:45:44,037 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:45:44,575 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:47:11,849 (trainer:732) INFO: 67epoch:train:1075-1432batch: iter_time=0.003, forward_time=0.100, loss_ctc=34.326, loss_att=17.063, acc=0.835, loss=22.242, backward_time=0.054, grad_norm=188.094, clip=100.000, loss_scale=289.079, optim_step_time=0.033, optim0_lr0=2.517e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:48:42,733 (trainer:732) INFO: 67epoch:train:1433-1790batch: iter_time=4.961e-04, forward_time=0.100, loss_ctc=36.237, loss_att=18.002, acc=0.833, loss=23.473, backward_time=0.053, grad_norm=190.932, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.516e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:50:12,216 (trainer:732) INFO: 67epoch:train:1791-2148batch: iter_time=0.003, forward_time=0.097, loss_ctc=34.326, loss_att=17.048, acc=0.832, loss=22.231, backward_time=0.053, grad_norm=184.754, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.515e-05, train_time=0.250 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:51:44,428 (trainer:732) INFO: 67epoch:train:2149-2506batch: iter_time=0.002, forward_time=0.101, loss_ctc=33.944, loss_att=16.865, acc=0.836, loss=21.989, backward_time=0.053, grad_norm=180.446, clip=100.000, loss_scale=256.000, optim_step_time=0.034, optim0_lr0=2.514e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:53:17,334 (trainer:732) INFO: 67epoch:train:2507-2864batch: iter_time=0.005, forward_time=0.101, loss_ctc=33.746, loss_att=16.717, acc=0.837, loss=21.826, backward_time=0.054, grad_norm=186.420, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.513e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:54:49,244 (trainer:732) INFO: 67epoch:train:2865-3222batch: iter_time=0.002, forward_time=0.100, loss_ctc=36.186, loss_att=18.043, acc=0.832, loss=23.486, backward_time=0.053, grad_norm=191.580, clip=100.000, loss_scale=348.961, optim_step_time=0.033, optim0_lr0=2.512e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:56:20,940 (trainer:732) INFO: 67epoch:train:3223-3580batch: iter_time=0.004, forward_time=0.099, loss_ctc=35.258, loss_att=17.503, acc=0.832, loss=22.830, backward_time=0.054, grad_norm=184.328, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.511e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:56:26,663 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:57:52,288 (trainer:732) INFO: 67epoch:train:3581-3938batch: iter_time=0.002, forward_time=0.099, loss_ctc=36.291, loss_att=17.981, acc=0.833, loss=23.474, backward_time=0.055, grad_norm=192.040, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.510e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 04:59:23,273 (trainer:732) INFO: 67epoch:train:3939-4296batch: iter_time=0.004, forward_time=0.098, loss_ctc=34.317, loss_att=17.120, acc=0.835, loss=22.279, backward_time=0.053, grad_norm=189.782, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.509e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:00:55,004 (trainer:732) INFO: 67epoch:train:4297-4654batch: iter_time=0.005, forward_time=0.099, loss_ctc=33.791, loss_att=16.700, acc=0.837, loss=21.827, backward_time=0.053, grad_norm=183.070, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.508e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:02:27,034 (trainer:732) INFO: 67epoch:train:4655-5012batch: iter_time=0.006, forward_time=0.099, loss_ctc=33.684, loss_att=16.720, acc=0.835, loss=21.809, backward_time=0.054, grad_norm=186.750, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.507e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:03:34,559 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:03:59,111 (trainer:732) INFO: 67epoch:train:5013-5370batch: iter_time=0.007, forward_time=0.098, loss_ctc=33.045, loss_att=16.371, acc=0.838, loss=21.373, backward_time=0.055, grad_norm=182.549, clip=100.000, loss_scale=771.585, optim_step_time=0.033, optim0_lr0=2.506e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:05:30,864 (trainer:732) INFO: 67epoch:train:5371-5728batch: iter_time=0.004, forward_time=0.099, loss_ctc=36.257, loss_att=17.972, acc=0.832, loss=23.458, backward_time=0.053, grad_norm=193.527, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.505e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:07:03,902 (trainer:732) INFO: 67epoch:train:5729-6086batch: iter_time=0.008, forward_time=0.099, loss_ctc=32.694, loss_att=16.205, acc=0.839, loss=21.152, backward_time=0.054, grad_norm=188.037, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.504e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:08:39,322 (trainer:732) INFO: 67epoch:train:6087-6444batch: iter_time=0.014, forward_time=0.100, loss_ctc=32.814, loss_att=16.254, acc=0.839, loss=21.222, backward_time=0.054, grad_norm=181.545, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.503e-05, train_time=0.266 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:10:11,862 (trainer:732) INFO: 67epoch:train:6445-6802batch: iter_time=0.007, forward_time=0.099, loss_ctc=34.379, loss_att=17.035, acc=0.837, loss=22.238, backward_time=0.054, grad_norm=179.047, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.502e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:10:23,573 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:11:45,428 (trainer:732) INFO: 67epoch:train:6803-7160batch: iter_time=0.006, forward_time=0.101, loss_ctc=35.973, loss_att=17.869, acc=0.833, loss=23.300, backward_time=0.053, grad_norm=193.058, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.502e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:12:54,778 (trainer:338) INFO: 67epoch results: [train] iter_time=0.004, forward_time=0.099, loss_ctc=34.622, loss_att=17.182, acc=0.835, loss=22.414, backward_time=0.054, grad_norm=186.756, clip=100.000, loss_scale=454.849, optim_step_time=0.033, optim0_lr0=2.510e-05, train_time=0.257, time=30 minutes and 39.09 seconds, total_count=479787, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=16.153, cer_ctc=0.086, loss_att=8.466, acc=0.917, cer=0.052, wer=0.695, loss=10.772, time=14.44 seconds, total_count=3551, gpu_max_cached_mem_GB=28.453, [att_plot] time=54.18 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:12:58,544 (trainer:384) INFO: There are no improvements in this epoch [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:12:58,571 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/57epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:12:58,571 (trainer:272) INFO: 68/100epoch started. Estimated time to finish: 17 hours, 37 minutes and 53.77 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:13:39,186 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:14:30,603 (trainer:732) INFO: 68epoch:train:1-358batch: iter_time=0.004, forward_time=0.100, loss_ctc=32.959, loss_att=16.334, acc=0.840, loss=21.321, backward_time=0.053, grad_norm=182.414, clip=100.000, loss_scale=575.104, optim_step_time=0.033, optim0_lr0=2.501e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:16:01,122 (trainer:732) INFO: 68epoch:train:359-716batch: iter_time=0.004, forward_time=0.097, loss_ctc=32.811, loss_att=16.251, acc=0.838, loss=21.219, backward_time=0.053, grad_norm=178.349, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.500e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:17:33,018 (trainer:732) INFO: 68epoch:train:717-1074batch: iter_time=0.001, forward_time=0.101, loss_ctc=34.921, loss_att=17.382, acc=0.835, loss=22.644, backward_time=0.054, grad_norm=189.271, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.499e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:19:04,302 (trainer:732) INFO: 68epoch:train:1075-1432batch: iter_time=0.002, forward_time=0.099, loss_ctc=35.152, loss_att=17.520, acc=0.833, loss=22.809, backward_time=0.053, grad_norm=190.272, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.498e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:19:30,434 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:20:34,300 (trainer:732) INFO: 68epoch:train:1433-1790batch: iter_time=7.446e-04, forward_time=0.099, loss_ctc=34.788, loss_att=17.256, acc=0.833, loss=22.515, backward_time=0.053, grad_norm=185.777, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.497e-05, train_time=0.251 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:21:17,052 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:22:05,124 (trainer:732) INFO: 68epoch:train:1791-2148batch: iter_time=0.002, forward_time=0.099, loss_ctc=34.924, loss_att=17.217, acc=0.839, loss=22.529, backward_time=0.054, grad_norm=190.624, clip=100.000, loss_scale=378.622, optim_step_time=0.033, optim0_lr0=2.496e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:23:11,781 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:23:37,024 (trainer:732) INFO: 68epoch:train:2149-2506batch: iter_time=0.004, forward_time=0.100, loss_ctc=33.701, loss_att=16.688, acc=0.839, loss=21.792, backward_time=0.053, grad_norm=181.010, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.495e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:24:27,703 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:25:12,017 (trainer:732) INFO: 68epoch:train:2507-2864batch: iter_time=0.012, forward_time=0.100, loss_ctc=34.163, loss_att=16.939, acc=0.837, loss=22.106, backward_time=0.053, grad_norm=185.968, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.494e-05, train_time=0.265 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:26:43,993 (trainer:732) INFO: 68epoch:train:2865-3222batch: iter_time=0.004, forward_time=0.099, loss_ctc=36.102, loss_att=17.928, acc=0.834, loss=23.380, backward_time=0.053, grad_norm=190.699, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.493e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:28:15,177 (trainer:732) INFO: 68epoch:train:3223-3580batch: iter_time=0.006, forward_time=0.098, loss_ctc=33.553, loss_att=16.655, acc=0.836, loss=21.725, backward_time=0.053, grad_norm=186.554, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.492e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:29:47,605 (trainer:732) INFO: 68epoch:train:3581-3938batch: iter_time=0.007, forward_time=0.099, loss_ctc=34.159, loss_att=16.960, acc=0.837, loss=22.120, backward_time=0.053, grad_norm=189.554, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.491e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:31:20,523 (trainer:732) INFO: 68epoch:train:3939-4296batch: iter_time=0.004, forward_time=0.101, loss_ctc=34.818, loss_att=17.267, acc=0.835, loss=22.532, backward_time=0.053, grad_norm=187.311, clip=100.000, loss_scale=494.838, optim_step_time=0.033, optim0_lr0=2.490e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:32:52,435 (trainer:732) INFO: 68epoch:train:4297-4654batch: iter_time=0.005, forward_time=0.099, loss_ctc=34.303, loss_att=17.035, acc=0.836, loss=22.216, backward_time=0.054, grad_norm=184.009, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.490e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:34:23,922 (trainer:732) INFO: 68epoch:train:4655-5012batch: iter_time=0.005, forward_time=0.098, loss_ctc=34.930, loss_att=17.359, acc=0.834, loss=22.631, backward_time=0.054, grad_norm=187.656, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.489e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:35:57,439 (trainer:732) INFO: 68epoch:train:5013-5370batch: iter_time=0.009, forward_time=0.099, loss_ctc=34.278, loss_att=17.028, acc=0.837, loss=22.203, backward_time=0.053, grad_norm=183.044, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.488e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:37:29,274 (trainer:732) INFO: 68epoch:train:5371-5728batch: iter_time=0.004, forward_time=0.100, loss_ctc=34.788, loss_att=17.299, acc=0.835, loss=22.545, backward_time=0.053, grad_norm=186.300, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.487e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:39:01,454 (trainer:732) INFO: 68epoch:train:5729-6086batch: iter_time=0.006, forward_time=0.099, loss_ctc=34.303, loss_att=16.938, acc=0.837, loss=22.148, backward_time=0.053, grad_norm=178.943, clip=100.000, loss_scale=689.341, optim_step_time=0.033, optim0_lr0=2.486e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:39:21,351 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:40:34,655 (trainer:732) INFO: 68epoch:train:6087-6444batch: iter_time=0.008, forward_time=0.100, loss_ctc=34.263, loss_att=16.967, acc=0.836, loss=22.156, backward_time=0.053, grad_norm=185.263, clip=100.000, loss_scale=619.563, optim_step_time=0.033, optim0_lr0=2.485e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:42:07,736 (trainer:732) INFO: 68epoch:train:6445-6802batch: iter_time=0.008, forward_time=0.099, loss_ctc=34.206, loss_att=16.952, acc=0.838, loss=22.128, backward_time=0.054, grad_norm=183.602, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.484e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:43:40,021 (trainer:732) INFO: 68epoch:train:6803-7160batch: iter_time=0.005, forward_time=0.099, loss_ctc=34.799, loss_att=17.285, acc=0.832, loss=22.539, backward_time=0.054, grad_norm=185.541, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.483e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:44:49,128 (trainer:338) INFO: 68epoch results: [train] iter_time=0.005, forward_time=0.099, loss_ctc=34.387, loss_att=17.058, acc=0.836, loss=22.257, backward_time=0.053, grad_norm=185.607, clip=100.000, loss_scale=457.853, optim_step_time=0.033, optim0_lr0=2.492e-05, train_time=0.257, time=30 minutes and 42.17 seconds, total_count=486948, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=15.967, cer_ctc=0.084, loss_att=8.388, acc=0.919, cer=0.051, wer=0.688, loss=10.662, time=14.46 seconds, total_count=3604, gpu_max_cached_mem_GB=28.453, [att_plot] time=53.92 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:44:52,880 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:44:52,908 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/58epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:44:52,908 (trainer:272) INFO: 69/100epoch started. Estimated time to finish: 17 hours, 5 minutes and 45.9 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:45:25,157 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:46:25,006 (trainer:732) INFO: 69epoch:train:1-358batch: iter_time=0.003, forward_time=0.100, loss_ctc=34.636, loss_att=17.141, acc=0.835, loss=22.390, backward_time=0.054, grad_norm=187.790, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.482e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:47:56,279 (trainer:732) INFO: 69epoch:train:359-716batch: iter_time=0.002, forward_time=0.100, loss_ctc=33.176, loss_att=16.458, acc=0.839, loss=21.474, backward_time=0.055, grad_norm=186.615, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.481e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:48:34,397 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:49:26,196 (trainer:732) INFO: 69epoch:train:717-1074batch: iter_time=4.223e-04, forward_time=0.099, loss_ctc=33.815, loss_att=16.721, acc=0.840, loss=21.849, backward_time=0.053, grad_norm=192.532, clip=100.000, loss_scale=363.563, optim_step_time=0.033, optim0_lr0=2.480e-05, train_time=0.251 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:50:57,770 (trainer:732) INFO: 69epoch:train:1075-1432batch: iter_time=8.757e-04, forward_time=0.100, loss_ctc=36.257, loss_att=18.012, acc=0.832, loss=23.485, backward_time=0.054, grad_norm=188.806, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.479e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:52:28,700 (trainer:732) INFO: 69epoch:train:1433-1790batch: iter_time=0.003, forward_time=0.099, loss_ctc=33.008, loss_att=16.373, acc=0.839, loss=21.364, backward_time=0.053, grad_norm=185.657, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.479e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:52:48,645 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:53:59,382 (trainer:732) INFO: 69epoch:train:1791-2148batch: iter_time=0.001, forward_time=0.099, loss_ctc=34.573, loss_att=17.078, acc=0.838, loss=22.326, backward_time=0.053, grad_norm=185.647, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.478e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:55:31,963 (trainer:732) INFO: 69epoch:train:2149-2506batch: iter_time=0.005, forward_time=0.100, loss_ctc=34.497, loss_att=17.122, acc=0.834, loss=22.335, backward_time=0.054, grad_norm=185.088, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.477e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:57:04,491 (trainer:732) INFO: 69epoch:train:2507-2864batch: iter_time=0.002, forward_time=0.101, loss_ctc=35.873, loss_att=17.758, acc=0.836, loss=23.192, backward_time=0.053, grad_norm=190.477, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.476e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:57:32,300 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 05:58:36,688 (trainer:732) INFO: 69epoch:train:2865-3222batch: iter_time=0.003, forward_time=0.100, loss_ctc=36.689, loss_att=18.204, acc=0.835, loss=23.750, backward_time=0.054, grad_norm=197.432, clip=100.000, loss_scale=328.426, optim_step_time=0.033, optim0_lr0=2.475e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:00:08,778 (trainer:732) INFO: 69epoch:train:3223-3580batch: iter_time=0.004, forward_time=0.100, loss_ctc=34.426, loss_att=17.106, acc=0.836, loss=22.302, backward_time=0.054, grad_norm=187.155, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.474e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:01:05,707 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:01:40,483 (trainer:732) INFO: 69epoch:train:3581-3938batch: iter_time=0.007, forward_time=0.098, loss_ctc=31.874, loss_att=15.809, acc=0.841, loss=20.628, backward_time=0.054, grad_norm=175.316, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.473e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:03:11,478 (trainer:732) INFO: 69epoch:train:3939-4296batch: iter_time=0.003, forward_time=0.099, loss_ctc=34.066, loss_att=16.836, acc=0.836, loss=22.005, backward_time=0.053, grad_norm=186.956, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.472e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:04:45,022 (trainer:732) INFO: 69epoch:train:4297-4654batch: iter_time=0.008, forward_time=0.100, loss_ctc=33.723, loss_att=16.743, acc=0.839, loss=21.837, backward_time=0.053, grad_norm=184.558, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.471e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:06:16,793 (trainer:732) INFO: 69epoch:train:4655-5012batch: iter_time=0.006, forward_time=0.099, loss_ctc=33.950, loss_att=16.871, acc=0.835, loss=21.995, backward_time=0.054, grad_norm=181.619, clip=100.000, loss_scale=286.749, optim_step_time=0.033, optim0_lr0=2.470e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:07:48,848 (trainer:732) INFO: 69epoch:train:5013-5370batch: iter_time=0.005, forward_time=0.099, loss_ctc=34.129, loss_att=16.895, acc=0.839, loss=22.066, backward_time=0.053, grad_norm=182.872, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.470e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:08:23,022 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:09:21,128 (trainer:732) INFO: 69epoch:train:5371-5728batch: iter_time=0.005, forward_time=0.099, loss_ctc=34.389, loss_att=17.020, acc=0.838, loss=22.231, backward_time=0.053, grad_norm=187.269, clip=100.000, loss_scale=349.938, optim_step_time=0.033, optim0_lr0=2.469e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:10:53,492 (trainer:732) INFO: 69epoch:train:5729-6086batch: iter_time=0.007, forward_time=0.098, loss_ctc=33.866, loss_att=16.777, acc=0.838, loss=21.904, backward_time=0.055, grad_norm=182.929, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.468e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:12:27,493 (trainer:732) INFO: 69epoch:train:6087-6444batch: iter_time=0.007, forward_time=0.100, loss_ctc=34.634, loss_att=17.248, acc=0.836, loss=22.464, backward_time=0.055, grad_norm=191.818, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.467e-05, train_time=0.262 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:14:00,881 (trainer:732) INFO: 69epoch:train:6445-6802batch: iter_time=0.008, forward_time=0.099, loss_ctc=34.381, loss_att=17.009, acc=0.838, loss=22.220, backward_time=0.053, grad_norm=182.972, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.466e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:15:33,401 (trainer:732) INFO: 69epoch:train:6803-7160batch: iter_time=0.009, forward_time=0.098, loss_ctc=32.869, loss_att=16.382, acc=0.838, loss=21.328, backward_time=0.054, grad_norm=179.231, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.465e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:16:42,141 (trainer:338) INFO: 69epoch results: [train] iter_time=0.004, forward_time=0.099, loss_ctc=34.220, loss_att=16.968, acc=0.837, loss=22.144, backward_time=0.054, grad_norm=186.130, clip=100.000, loss_scale=309.611, optim_step_time=0.033, optim0_lr0=2.474e-05, train_time=0.257, time=30 minutes and 41.16 seconds, total_count=494109, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=15.866, cer_ctc=0.084, loss_att=8.292, acc=0.920, cer=0.051, wer=0.691, loss=10.564, time=14.42 seconds, total_count=3657, gpu_max_cached_mem_GB=28.453, [att_plot] time=53.65 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:16:45,868 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:16:45,896 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/59epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:16:45,896 (trainer:272) INFO: 70/100epoch started. Estimated time to finish: 16 hours, 33 minutes and 37.82 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:17:45,300 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:18:17,375 (trainer:732) INFO: 70epoch:train:1-358batch: iter_time=0.003, forward_time=0.099, loss_ctc=31.967, loss_att=15.807, acc=0.839, loss=20.655, backward_time=0.055, grad_norm=180.000, clip=100.000, loss_scale=268.156, optim_step_time=0.033, optim0_lr0=2.464e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:19:49,255 (trainer:732) INFO: 70epoch:train:359-716batch: iter_time=0.001, forward_time=0.101, loss_ctc=35.472, loss_att=17.531, acc=0.837, loss=22.913, backward_time=0.053, grad_norm=195.589, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.463e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:21:19,614 (trainer:732) INFO: 70epoch:train:717-1074batch: iter_time=0.001, forward_time=0.099, loss_ctc=34.524, loss_att=17.145, acc=0.836, loss=22.359, backward_time=0.053, grad_norm=184.605, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.462e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:22:02,418 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:22:50,137 (trainer:732) INFO: 70epoch:train:1075-1432batch: iter_time=0.001, forward_time=0.099, loss_ctc=33.936, loss_att=16.799, acc=0.839, loss=21.940, backward_time=0.053, grad_norm=187.895, clip=100.000, loss_scale=376.471, optim_step_time=0.033, optim0_lr0=2.461e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:24:19,714 (trainer:732) INFO: 70epoch:train:1433-1790batch: iter_time=6.332e-04, forward_time=0.098, loss_ctc=34.088, loss_att=16.926, acc=0.836, loss=22.075, backward_time=0.054, grad_norm=185.899, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.461e-05, train_time=0.250 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:25:52,545 (trainer:732) INFO: 70epoch:train:1791-2148batch: iter_time=0.005, forward_time=0.101, loss_ctc=34.313, loss_att=16.977, acc=0.838, loss=22.178, backward_time=0.053, grad_norm=186.595, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.460e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:27:23,547 (trainer:732) INFO: 70epoch:train:2149-2506batch: iter_time=0.002, forward_time=0.099, loss_ctc=34.322, loss_att=17.038, acc=0.837, loss=22.223, backward_time=0.053, grad_norm=190.311, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.459e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:28:54,141 (trainer:732) INFO: 70epoch:train:2507-2864batch: iter_time=0.002, forward_time=0.099, loss_ctc=32.857, loss_att=16.219, acc=0.838, loss=21.211, backward_time=0.054, grad_norm=187.982, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.458e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:30:25,698 (trainer:732) INFO: 70epoch:train:2865-3222batch: iter_time=0.005, forward_time=0.099, loss_ctc=33.408, loss_att=16.482, acc=0.841, loss=21.560, backward_time=0.053, grad_norm=188.267, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.457e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:31:57,769 (trainer:732) INFO: 70epoch:train:3223-3580batch: iter_time=0.003, forward_time=0.100, loss_ctc=33.982, loss_att=16.842, acc=0.838, loss=21.984, backward_time=0.053, grad_norm=188.018, clip=100.000, loss_scale=496.983, optim_step_time=0.033, optim0_lr0=2.456e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:32:13,101 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:33:30,861 (trainer:732) INFO: 70epoch:train:3581-3938batch: iter_time=0.004, forward_time=0.100, loss_ctc=36.819, loss_att=18.281, acc=0.835, loss=23.842, backward_time=0.053, grad_norm=197.840, clip=100.000, loss_scale=298.308, optim_step_time=0.033, optim0_lr0=2.455e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:35:03,692 (trainer:732) INFO: 70epoch:train:3939-4296batch: iter_time=0.007, forward_time=0.099, loss_ctc=34.365, loss_att=17.121, acc=0.839, loss=22.294, backward_time=0.053, grad_norm=193.679, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.454e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:35:24,814 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:36:36,515 (trainer:732) INFO: 70epoch:train:4297-4654batch: iter_time=0.006, forward_time=0.100, loss_ctc=33.415, loss_att=16.607, acc=0.839, loss=21.650, backward_time=0.053, grad_norm=190.754, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.453e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:38:08,084 (trainer:732) INFO: 70epoch:train:4655-5012batch: iter_time=0.005, forward_time=0.099, loss_ctc=33.649, loss_att=16.657, acc=0.839, loss=21.755, backward_time=0.053, grad_norm=181.287, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.453e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:39:40,747 (trainer:732) INFO: 70epoch:train:5013-5370batch: iter_time=0.007, forward_time=0.099, loss_ctc=33.003, loss_att=16.375, acc=0.839, loss=21.363, backward_time=0.053, grad_norm=183.707, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.452e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:39:41,850 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:41:13,965 (trainer:732) INFO: 70epoch:train:5371-5728batch: iter_time=0.006, forward_time=0.100, loss_ctc=35.185, loss_att=17.460, acc=0.837, loss=22.778, backward_time=0.053, grad_norm=191.244, clip=100.000, loss_scale=318.927, optim_step_time=0.033, optim0_lr0=2.451e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:42:45,363 (trainer:732) INFO: 70epoch:train:5729-6086batch: iter_time=0.005, forward_time=0.099, loss_ctc=34.078, loss_att=16.836, acc=0.837, loss=22.009, backward_time=0.053, grad_norm=182.742, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.450e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:44:16,916 (trainer:732) INFO: 70epoch:train:6087-6444batch: iter_time=0.005, forward_time=0.099, loss_ctc=34.276, loss_att=16.949, acc=0.837, loss=22.147, backward_time=0.054, grad_norm=185.729, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.449e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:45:50,384 (trainer:732) INFO: 70epoch:train:6445-6802batch: iter_time=0.007, forward_time=0.100, loss_ctc=34.935, loss_att=17.368, acc=0.837, loss=22.638, backward_time=0.053, grad_norm=186.070, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.448e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:47:23,087 (trainer:732) INFO: 70epoch:train:6803-7160batch: iter_time=0.009, forward_time=0.098, loss_ctc=32.433, loss_att=16.061, acc=0.840, loss=20.973, backward_time=0.053, grad_norm=184.284, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.447e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:48:31,683 (trainer:338) INFO: 70epoch results: [train] iter_time=0.004, forward_time=0.099, loss_ctc=34.027, loss_att=16.861, acc=0.838, loss=22.011, backward_time=0.053, grad_norm=187.613, clip=100.000, loss_scale=356.769, optim_step_time=0.033, optim0_lr0=2.456e-05, train_time=0.256, time=30 minutes and 37.85 seconds, total_count=501270, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=15.673, cer_ctc=0.083, loss_att=8.239, acc=0.920, cer=0.050, wer=0.689, loss=10.469, time=14.3 seconds, total_count=3710, gpu_max_cached_mem_GB=28.453, [att_plot] time=53.63 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:48:35,272 (trainer:384) INFO: There are no improvements in this epoch [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:48:35,302 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/60epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:48:35,302 (trainer:272) INFO: 71/100epoch started. Estimated time to finish: 16 hours, 1 minute and 28.6 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:50:06,436 (trainer:732) INFO: 71epoch:train:1-358batch: iter_time=0.002, forward_time=0.099, loss_ctc=33.522, loss_att=16.581, acc=0.839, loss=21.664, backward_time=0.053, grad_norm=190.692, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.446e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:50:57,279 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:51:38,022 (trainer:732) INFO: 71epoch:train:359-716batch: iter_time=0.003, forward_time=0.100, loss_ctc=34.300, loss_att=17.021, acc=0.838, loss=22.205, backward_time=0.053, grad_norm=190.984, clip=100.000, loss_scale=625.300, optim_step_time=0.033, optim0_lr0=2.446e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:53:08,885 (trainer:732) INFO: 71epoch:train:717-1074batch: iter_time=8.178e-04, forward_time=0.100, loss_ctc=32.466, loss_att=16.027, acc=0.842, loss=20.959, backward_time=0.055, grad_norm=183.854, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.445e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:54:40,996 (trainer:732) INFO: 71epoch:train:1075-1432batch: iter_time=0.003, forward_time=0.100, loss_ctc=34.929, loss_att=17.332, acc=0.839, loss=22.611, backward_time=0.053, grad_norm=187.610, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.444e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:56:11,527 (trainer:732) INFO: 71epoch:train:1433-1790batch: iter_time=7.580e-04, forward_time=0.099, loss_ctc=34.439, loss_att=17.035, acc=0.839, loss=22.256, backward_time=0.053, grad_norm=184.506, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.443e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:57:41,527 (trainer:732) INFO: 71epoch:train:1791-2148batch: iter_time=0.003, forward_time=0.098, loss_ctc=33.177, loss_att=16.440, acc=0.839, loss=21.461, backward_time=0.054, grad_norm=187.026, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.442e-05, train_time=0.251 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:59:12,773 (trainer:732) INFO: 71epoch:train:2149-2506batch: iter_time=0.003, forward_time=0.099, loss_ctc=34.263, loss_att=16.919, acc=0.839, loss=22.122, backward_time=0.053, grad_norm=182.356, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.441e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 06:59:56,142 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:00:33,974 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:00:44,338 (trainer:732) INFO: 71epoch:train:2507-2864batch: iter_time=0.004, forward_time=0.099, loss_ctc=32.795, loss_att=16.201, acc=0.839, loss=21.179, backward_time=0.054, grad_norm=183.239, clip=100.000, loss_scale=648.629, optim_step_time=0.033, optim0_lr0=2.440e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:02:16,223 (trainer:732) INFO: 71epoch:train:2865-3222batch: iter_time=0.003, forward_time=0.100, loss_ctc=32.290, loss_att=15.923, acc=0.842, loss=20.833, backward_time=0.053, grad_norm=189.208, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.440e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:03:48,371 (trainer:732) INFO: 71epoch:train:3223-3580batch: iter_time=0.003, forward_time=0.100, loss_ctc=33.392, loss_att=16.465, acc=0.840, loss=21.544, backward_time=0.054, grad_norm=189.041, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.439e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:05:18,763 (trainer:732) INFO: 71epoch:train:3581-3938batch: iter_time=0.002, forward_time=0.099, loss_ctc=33.959, loss_att=16.780, acc=0.837, loss=21.934, backward_time=0.053, grad_norm=186.191, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.438e-05, train_time=0.252 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:06:50,214 (trainer:732) INFO: 71epoch:train:3939-4296batch: iter_time=0.004, forward_time=0.099, loss_ctc=34.438, loss_att=17.106, acc=0.838, loss=22.305, backward_time=0.053, grad_norm=188.214, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.437e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:08:00,393 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:08:21,742 (trainer:732) INFO: 71epoch:train:4297-4654batch: iter_time=0.003, forward_time=0.100, loss_ctc=33.761, loss_att=16.698, acc=0.838, loss=21.817, backward_time=0.053, grad_norm=187.788, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.436e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:09:39,869 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:09:53,894 (trainer:732) INFO: 71epoch:train:4655-5012batch: iter_time=0.003, forward_time=0.101, loss_ctc=32.695, loss_att=16.157, acc=0.842, loss=21.119, backward_time=0.054, grad_norm=181.549, clip=100.000, loss_scale=391.866, optim_step_time=0.034, optim0_lr0=2.435e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:11:26,966 (trainer:732) INFO: 71epoch:train:5013-5370batch: iter_time=0.004, forward_time=0.101, loss_ctc=35.485, loss_att=17.583, acc=0.836, loss=22.954, backward_time=0.053, grad_norm=187.476, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.434e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:12:59,674 (trainer:732) INFO: 71epoch:train:5371-5728batch: iter_time=0.004, forward_time=0.100, loss_ctc=34.443, loss_att=17.046, acc=0.838, loss=22.265, backward_time=0.054, grad_norm=184.077, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.433e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:13:16,680 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:13:34,670 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:14:30,742 (trainer:732) INFO: 71epoch:train:5729-6086batch: iter_time=0.003, forward_time=0.099, loss_ctc=34.305, loss_att=16.969, acc=0.837, loss=22.170, backward_time=0.055, grad_norm=187.591, clip=100.000, loss_scale=304.045, optim_step_time=0.033, optim0_lr0=2.433e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:16:02,781 (trainer:732) INFO: 71epoch:train:6087-6444batch: iter_time=0.005, forward_time=0.099, loss_ctc=34.000, loss_att=16.866, acc=0.839, loss=22.006, backward_time=0.054, grad_norm=189.672, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.432e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:17:34,989 (trainer:732) INFO: 71epoch:train:6445-6802batch: iter_time=0.005, forward_time=0.099, loss_ctc=34.959, loss_att=17.358, acc=0.838, loss=22.638, backward_time=0.053, grad_norm=188.133, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.431e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:19:07,965 (trainer:732) INFO: 71epoch:train:6803-7160batch: iter_time=0.007, forward_time=0.100, loss_ctc=33.952, loss_att=16.818, acc=0.840, loss=21.958, backward_time=0.053, grad_norm=188.301, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.430e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:20:16,917 (trainer:338) INFO: 71epoch results: [train] iter_time=0.003, forward_time=0.099, loss_ctc=33.865, loss_att=16.759, acc=0.839, loss=21.891, backward_time=0.054, grad_norm=186.869, clip=100.000, loss_scale=405.587, optim_step_time=0.033, optim0_lr0=2.438e-05, train_time=0.256, time=30 minutes and 33.33 seconds, total_count=508431, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=15.630, cer_ctc=0.082, loss_att=8.189, acc=0.920, cer=0.050, wer=0.679, loss=10.421, time=14.5 seconds, total_count=3763, gpu_max_cached_mem_GB=28.453, [att_plot] time=53.78 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:20:20,576 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:20:20,604 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/61epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:20:20,604 (trainer:272) INFO: 72/100epoch started. Estimated time to finish: 15 hours, 29 minutes and 18.23 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:21:53,238 (trainer:732) INFO: 72epoch:train:1-358batch: iter_time=0.004, forward_time=0.100, loss_ctc=35.948, loss_att=17.786, acc=0.836, loss=23.235, backward_time=0.054, grad_norm=190.649, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.429e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:23:09,169 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:23:24,016 (trainer:732) INFO: 72epoch:train:359-716batch: iter_time=0.002, forward_time=0.099, loss_ctc=33.222, loss_att=16.411, acc=0.839, loss=21.454, backward_time=0.055, grad_norm=177.607, clip=100.000, loss_scale=313.922, optim_step_time=0.033, optim0_lr0=2.428e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:24:53,392 (trainer:732) INFO: 72epoch:train:717-1074batch: iter_time=0.002, forward_time=0.097, loss_ctc=34.208, loss_att=16.959, acc=0.839, loss=22.134, backward_time=0.053, grad_norm=188.667, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.427e-05, train_time=0.249 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:26:24,184 (trainer:732) INFO: 72epoch:train:1075-1432batch: iter_time=0.002, forward_time=0.099, loss_ctc=33.730, loss_att=16.743, acc=0.840, loss=21.839, backward_time=0.054, grad_norm=183.773, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.427e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:27:55,239 (trainer:732) INFO: 72epoch:train:1433-1790batch: iter_time=0.002, forward_time=0.098, loss_ctc=33.855, loss_att=16.723, acc=0.839, loss=21.863, backward_time=0.056, grad_norm=185.292, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.426e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:28:28,304 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:29:26,415 (trainer:732) INFO: 72epoch:train:1791-2148batch: iter_time=0.002, forward_time=0.099, loss_ctc=32.895, loss_att=16.284, acc=0.841, loss=21.268, backward_time=0.054, grad_norm=181.153, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.425e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:30:57,659 (trainer:732) INFO: 72epoch:train:2149-2506batch: iter_time=0.002, forward_time=0.099, loss_ctc=33.577, loss_att=16.677, acc=0.839, loss=21.747, backward_time=0.053, grad_norm=189.220, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.424e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:31:42,756 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:32:29,153 (trainer:732) INFO: 72epoch:train:2507-2864batch: iter_time=0.002, forward_time=0.100, loss_ctc=33.882, loss_att=16.757, acc=0.842, loss=21.895, backward_time=0.053, grad_norm=189.122, clip=100.000, loss_scale=582.275, optim_step_time=0.033, optim0_lr0=2.423e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:34:02,094 (trainer:732) INFO: 72epoch:train:2865-3222batch: iter_time=0.004, forward_time=0.101, loss_ctc=35.033, loss_att=17.323, acc=0.837, loss=22.636, backward_time=0.053, grad_norm=191.166, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.422e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:35:36,270 (trainer:732) INFO: 72epoch:train:3223-3580batch: iter_time=0.008, forward_time=0.101, loss_ctc=34.110, loss_att=16.875, acc=0.839, loss=22.045, backward_time=0.053, grad_norm=186.452, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.422e-05, train_time=0.263 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:37:08,049 (trainer:732) INFO: 72epoch:train:3581-3938batch: iter_time=0.004, forward_time=0.099, loss_ctc=33.049, loss_att=16.367, acc=0.841, loss=21.372, backward_time=0.054, grad_norm=183.496, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.421e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:38:39,933 (trainer:732) INFO: 72epoch:train:3939-4296batch: iter_time=0.004, forward_time=0.100, loss_ctc=33.514, loss_att=16.525, acc=0.842, loss=21.622, backward_time=0.053, grad_norm=184.499, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.420e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:40:12,338 (trainer:732) INFO: 72epoch:train:4297-4654batch: iter_time=0.009, forward_time=0.098, loss_ctc=31.730, loss_att=15.730, acc=0.843, loss=20.530, backward_time=0.053, grad_norm=183.650, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.419e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:40:29,622 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:41:43,949 (trainer:732) INFO: 72epoch:train:4655-5012batch: iter_time=0.005, forward_time=0.098, loss_ctc=33.788, loss_att=16.689, acc=0.840, loss=21.818, backward_time=0.053, grad_norm=185.901, clip=100.000, loss_scale=565.064, optim_step_time=0.033, optim0_lr0=2.418e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:43:17,334 (trainer:732) INFO: 72epoch:train:5013-5370batch: iter_time=0.005, forward_time=0.100, loss_ctc=34.672, loss_att=17.194, acc=0.840, loss=22.438, backward_time=0.053, grad_norm=187.146, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.417e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:44:49,409 (trainer:732) INFO: 72epoch:train:5371-5728batch: iter_time=0.006, forward_time=0.099, loss_ctc=32.963, loss_att=16.304, acc=0.842, loss=21.302, backward_time=0.054, grad_norm=183.156, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.416e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:46:21,307 (trainer:732) INFO: 72epoch:train:5729-6086batch: iter_time=0.005, forward_time=0.099, loss_ctc=33.446, loss_att=16.531, acc=0.840, loss=21.605, backward_time=0.055, grad_norm=183.953, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.416e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:47:54,999 (trainer:732) INFO: 72epoch:train:6087-6444batch: iter_time=0.009, forward_time=0.100, loss_ctc=32.691, loss_att=16.138, acc=0.842, loss=21.104, backward_time=0.053, grad_norm=181.774, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.415e-05, train_time=0.261 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:49:27,940 (trainer:732) INFO: 72epoch:train:6445-6802batch: iter_time=0.005, forward_time=0.100, loss_ctc=34.232, loss_att=16.918, acc=0.839, loss=22.112, backward_time=0.056, grad_norm=188.728, clip=100.000, loss_scale=624.983, optim_step_time=0.033, optim0_lr0=2.414e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:49:33,664 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:49:34,151 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:51:00,305 (trainer:732) INFO: 72epoch:train:6803-7160batch: iter_time=0.007, forward_time=0.099, loss_ctc=33.479, loss_att=16.610, acc=0.840, loss=21.670, backward_time=0.053, grad_norm=192.200, clip=100.000, loss_scale=544.986, optim_step_time=0.033, optim0_lr0=2.413e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:52:08,834 (trainer:338) INFO: 72epoch results: [train] iter_time=0.005, forward_time=0.099, loss_ctc=33.687, loss_att=16.670, acc=0.840, loss=21.775, backward_time=0.054, grad_norm=185.871, clip=100.000, loss_scale=502.737, optim_step_time=0.033, optim0_lr0=2.421e-05, train_time=0.257, time=30 minutes and 40.36 seconds, total_count=515592, gpu_max_cached_mem_GB=28.453, [valid] loss_ctc=15.630, cer_ctc=0.082, loss_att=8.167, acc=0.920, cer=0.050, wer=0.683, loss=10.406, time=14.4 seconds, total_count=3816, gpu_max_cached_mem_GB=28.453, [att_plot] time=53.47 seconds, total_count=0, gpu_max_cached_mem_GB=28.453 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:52:12,728 (trainer:386) INFO: The best model has been updated: valid.acc [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:52:12,756 (trainer:440) INFO: The model files were removed: exp/asr_train_raw_bpe2000_sp/62epoch.pth [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:52:12,756 (trainer:272) INFO: 73/100epoch started. Estimated time to finish: 14 hours, 57 minutes and 11.31 seconds [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:53:43,920 (trainer:732) INFO: 73epoch:train:1-358batch: iter_time=0.003, forward_time=0.099, loss_ctc=32.989, loss_att=16.308, acc=0.841, loss=21.313, backward_time=0.053, grad_norm=184.869, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.412e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:55:14,938 (trainer:732) INFO: 73epoch:train:359-716batch: iter_time=4.680e-04, forward_time=0.100, loss_ctc=34.275, loss_att=16.934, acc=0.839, loss=22.136, backward_time=0.052, grad_norm=185.802, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.411e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:56:45,055 (trainer:732) INFO: 73epoch:train:717-1074batch: iter_time=0.002, forward_time=0.098, loss_ctc=34.147, loss_att=16.918, acc=0.842, loss=22.087, backward_time=0.052, grad_norm=183.355, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.411e-05, train_time=0.251 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:58:15,838 (trainer:732) INFO: 73epoch:train:1075-1432batch: iter_time=5.873e-04, forward_time=0.100, loss_ctc=34.276, loss_att=16.972, acc=0.840, loss=22.163, backward_time=0.053, grad_norm=185.418, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.410e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 07:59:46,453 (trainer:732) INFO: 73epoch:train:1433-1790batch: iter_time=0.003, forward_time=0.098, loss_ctc=32.625, loss_att=16.075, acc=0.843, loss=21.040, backward_time=0.054, grad_norm=181.739, clip=100.000, loss_scale=690.771, optim_step_time=0.033, optim0_lr0=2.409e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 08:00:17,736 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 08:01:17,354 (trainer:732) INFO: 73epoch:train:1791-2148batch: iter_time=0.002, forward_time=0.099, loss_ctc=34.698, loss_att=17.184, acc=0.840, loss=22.438, backward_time=0.055, grad_norm=190.913, clip=100.000, loss_scale=691.272, optim_step_time=0.033, optim0_lr0=2.408e-05, train_time=0.254 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 08:02:48,096 (trainer:732) INFO: 73epoch:train:2149-2506batch: iter_time=0.001, forward_time=0.099, loss_ctc=34.747, loss_att=17.155, acc=0.839, loss=22.432, backward_time=0.054, grad_norm=193.690, clip=100.000, loss_scale=512.000, optim_step_time=0.033, optim0_lr0=2.407e-05, train_time=0.253 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 08:02:55,470 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 08:04:19,621 (trainer:732) INFO: 73epoch:train:2507-2864batch: iter_time=0.003, forward_time=0.100, loss_ctc=32.554, loss_att=16.095, acc=0.843, loss=21.033, backward_time=0.054, grad_norm=184.312, clip=100.000, loss_scale=275.361, optim_step_time=0.033, optim0_lr0=2.406e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 08:05:37,146 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 08:05:52,808 (trainer:732) INFO: 73epoch:train:2865-3222batch: iter_time=0.004, forward_time=0.101, loss_ctc=33.201, loss_att=16.511, acc=0.842, loss=21.518, backward_time=0.054, grad_norm=186.271, clip=100.000, loss_scale=256.000, optim_step_time=0.034, optim0_lr0=2.406e-05, train_time=0.260 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 08:07:24,733 (trainer:732) INFO: 73epoch:train:3223-3580batch: iter_time=0.006, forward_time=0.098, loss_ctc=31.259, loss_att=15.449, acc=0.845, loss=20.192, backward_time=0.053, grad_norm=182.266, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.405e-05, train_time=0.257 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 08:08:48,575 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 08:08:55,950 (trainer:732) INFO: 73epoch:train:3581-3938batch: iter_time=0.002, forward_time=0.100, loss_ctc=33.680, loss_att=16.606, acc=0.841, loss=21.728, backward_time=0.054, grad_norm=185.984, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.404e-05, train_time=0.255 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 08:10:27,781 (trainer:732) INFO: 73epoch:train:3939-4296batch: iter_time=0.007, forward_time=0.098, loss_ctc=32.415, loss_att=16.006, acc=0.843, loss=20.929, backward_time=0.053, grad_norm=183.181, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.403e-05, train_time=0.256 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 08:12:00,423 (trainer:732) INFO: 73epoch:train:4297-4654batch: iter_time=0.005, forward_time=0.100, loss_ctc=33.946, loss_att=16.840, acc=0.840, loss=21.972, backward_time=0.053, grad_norm=188.723, clip=100.000, loss_scale=341.810, optim_step_time=0.033, optim0_lr0=2.402e-05, train_time=0.259 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 08:12:14,448 (trainer:663) WARNING: The grad norm is inf. Skipping updating the model. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 08:12:32,050 (preprocessor:336) WARNING: The length of the text output exceeds 100, which may cause OOM on the GPU.Please ensure that the data processing is correct and verify it. [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 08:13:32,895 (trainer:732) INFO: 73epoch:train:4655-5012batch: iter_time=0.005, forward_time=0.099, loss_ctc=34.806, loss_att=17.235, acc=0.837, loss=22.506, backward_time=0.053, grad_norm=197.161, clip=100.000, loss_scale=294.723, optim_step_time=0.033, optim0_lr0=2.401e-05, train_time=0.258 [mlxlabq1l19yow63f8475a-20230224051258-1mabjw-o3zjcg-worker] 2023-05-14 08:15:06,707 (trainer:732) INFO: 73epoch:train:5013-5370batch: iter_time=0.011, forward_time=0.099, loss_ctc=32.688, loss_att=16.149, acc=0.842, loss=21.111, backward_time=0.053, grad_norm=185.098, clip=100.000, loss_scale=256.000, optim_step_time=0.033, optim0_lr0=2.401e-05, train_time=0.262