07/10/2024 15:30:43 - INFO - llamafactory.hparams.parser - Process rank: 2, device: cuda:2, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16 07/10/2024 15:30:43 - INFO - llamafactory.hparams.parser - Process rank: 3, device: cuda:3, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16 [INFO|parser.py:325] 2024-07-10 15:30:43,487 >> Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16 07/10/2024 15:30:43 - INFO - llamafactory.hparams.parser - Process rank: 4, device: cuda:4, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16 07/10/2024 15:30:43 - INFO - llamafactory.hparams.parser - Process rank: 1, device: cuda:1, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16 07/10/2024 15:30:43 - INFO - llamafactory.hparams.parser - Process rank: 7, device: cuda:7, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16 07/10/2024 15:30:43 - INFO - llamafactory.hparams.parser - Process rank: 5, device: cuda:5, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16 07/10/2024 15:30:43 - INFO - llamafactory.hparams.parser - Process rank: 6, device: cuda:6, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16 07/10/2024 15:30:43 - WARNING - transformers.tokenization_utils_base - Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 07/10/2024 15:30:43 - INFO - llamafactory.data.template - Add pad token: <|end_of_text|> 07/10/2024 15:30:44 - WARNING - transformers.tokenization_utils_base - Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 07/10/2024 15:30:44 - INFO - llamafactory.data.template - Add pad token: <|end_of_text|> [INFO|tokenization_utils_base.py:2161] 2024-07-10 15:30:43,965 >> loading file tokenizer.json from cache at /root/.cache/huggingface/hub/models--meta-llama--Meta-Llama-3-8B/snapshots/62bd457b6fe961a42a631306577e622c83876cb6/tokenizer.json [INFO|tokenization_utils_base.py:2161] 2024-07-10 15:30:43,965 >> loading file added_tokens.json from cache at None [INFO|tokenization_utils_base.py:2161] 2024-07-10 15:30:43,965 >> loading file special_tokens_map.json from cache at /root/.cache/huggingface/hub/models--meta-llama--Meta-Llama-3-8B/snapshots/62bd457b6fe961a42a631306577e622c83876cb6/special_tokens_map.json [INFO|tokenization_utils_base.py:2161] 2024-07-10 15:30:43,965 >> loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--meta-llama--Meta-Llama-3-8B/snapshots/62bd457b6fe961a42a631306577e622c83876cb6/tokenizer_config.json [WARNING|logging.py:313] 2024-07-10 15:30:44,255 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [INFO|template.py:372] 2024-07-10 15:30:44,255 >> Add pad token: <|end_of_text|> [INFO|loader.py:50] 2024-07-10 15:30:44,255 >> Loading dataset train_output.json... 07/10/2024 15:30:44 - WARNING - transformers.tokenization_utils_base - Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 07/10/2024 15:30:44 - INFO - llamafactory.data.template - Add pad token: <|end_of_text|> 07/10/2024 15:30:44 - WARNING - transformers.tokenization_utils_base - Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 07/10/2024 15:30:44 - INFO - llamafactory.data.template - Add pad token: <|end_of_text|> 07/10/2024 15:30:44 - WARNING - transformers.tokenization_utils_base - Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 07/10/2024 15:30:44 - INFO - llamafactory.data.template - Add pad token: <|end_of_text|> 07/10/2024 15:30:44 - WARNING - transformers.tokenization_utils_base - Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 07/10/2024 15:30:44 - INFO - llamafactory.data.template - Add pad token: <|end_of_text|> 07/10/2024 15:30:44 - WARNING - transformers.tokenization_utils_base - Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 07/10/2024 15:30:44 - INFO - llamafactory.data.template - Add pad token: <|end_of_text|> 07/10/2024 15:30:45 - INFO - llamafactory.data.loader - Loading dataset train_output.json... 07/10/2024 15:30:45 - INFO - llamafactory.data.loader - Loading dataset train_output.json... 07/10/2024 15:30:45 - INFO - llamafactory.data.loader - Loading dataset train_output.json... 07/10/2024 15:30:45 - INFO - llamafactory.data.loader - Loading dataset train_output.json... 07/10/2024 15:30:45 - INFO - llamafactory.data.loader - Loading dataset train_output.json... 07/10/2024 15:30:45 - INFO - llamafactory.data.loader - Loading dataset train_output.json... 07/10/2024 15:30:45 - INFO - llamafactory.data.loader - Loading dataset train_output.json... [INFO|configuration_utils.py:733] 2024-07-10 15:30:46,457 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--meta-llama--Meta-Llama-3-8B/snapshots/62bd457b6fe961a42a631306577e622c83876cb6/config.json [INFO|configuration_utils.py:800] 2024-07-10 15:30:46,458 >> Model config LlamaConfig { "_name_or_path": "meta-llama/Meta-Llama-3-8B", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "eos_token_id": 128001, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 8192, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "rope_theta": 500000.0, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.42.3", "use_cache": true, "vocab_size": 128256 } [INFO|modeling_utils.py:3556] 2024-07-10 15:30:46,480 >> loading weights file model.safetensors from cache at /root/.cache/huggingface/hub/models--meta-llama--Meta-Llama-3-8B/snapshots/62bd457b6fe961a42a631306577e622c83876cb6/model.safetensors.index.json [INFO|modeling_utils.py:1531] 2024-07-10 15:30:46,481 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16. [INFO|configuration_utils.py:1000] 2024-07-10 15:30:46,482 >> Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128001 } [INFO|modeling_utils.py:4364] 2024-07-10 15:30:50,121 >> All model checkpoint weights were used when initializing LlamaForCausalLM. [INFO|modeling_utils.py:4372] 2024-07-10 15:30:50,121 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at meta-llama/Meta-Llama-3-8B. If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. [INFO|configuration_utils.py:955] 2024-07-10 15:30:50,295 >> loading configuration file generation_config.json from cache at /root/.cache/huggingface/hub/models--meta-llama--Meta-Llama-3-8B/snapshots/62bd457b6fe961a42a631306577e622c83876cb6/generation_config.json [INFO|configuration_utils.py:1000] 2024-07-10 15:30:50,295 >> Generate config GenerationConfig { "bos_token_id": 128000, "do_sample": true, "eos_token_id": 128001, "max_length": 4096, "temperature": 0.6, "top_p": 0.9 } [INFO|checkpointing.py:103] 2024-07-10 15:30:50,302 >> Gradient checkpointing enabled. [INFO|attention.py:80] 2024-07-10 15:30:50,303 >> Using torch SDPA for faster training and inference. [INFO|adapter.py:302] 2024-07-10 15:30:50,303 >> Upcasting trainable params to float32. [INFO|adapter.py:48] 2024-07-10 15:30:50,303 >> Fine-tuning method: Full [INFO|loader.py:196] 2024-07-10 15:30:50,345 >> trainable params: 8,030,261,248 || all params: 8,030,261,248 || trainable%: 100.0000 07/10/2024 15:30:50 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled. 07/10/2024 15:30:50 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference. 07/10/2024 15:30:50 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. 07/10/2024 15:30:50 - INFO - llamafactory.model.adapter - Fine-tuning method: Full 07/10/2024 15:30:50 - INFO - llamafactory.model.loader - trainable params: 8,030,261,248 || all params: 8,030,261,248 || trainable%: 100.0000 07/10/2024 15:30:50 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled. 07/10/2024 15:30:50 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference. 07/10/2024 15:30:50 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. 07/10/2024 15:30:50 - INFO - llamafactory.model.adapter - Fine-tuning method: Full [INFO|trainer.py:642] 2024-07-10 15:30:50,350 >> Using auto half precision backend 07/10/2024 15:30:50 - INFO - llamafactory.model.loader - trainable params: 8,030,261,248 || all params: 8,030,261,248 || trainable%: 100.0000 07/10/2024 15:30:50 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled. 07/10/2024 15:30:50 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference. 07/10/2024 15:30:50 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. 07/10/2024 15:30:50 - INFO - llamafactory.model.adapter - Fine-tuning method: Full 07/10/2024 15:30:50 - INFO - llamafactory.model.loader - trainable params: 8,030,261,248 || all params: 8,030,261,248 || trainable%: 100.0000 07/10/2024 15:30:50 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled. 07/10/2024 15:30:50 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference. 07/10/2024 15:30:50 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. 07/10/2024 15:30:50 - INFO - llamafactory.model.adapter - Fine-tuning method: Full 07/10/2024 15:30:50 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled. 07/10/2024 15:30:50 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference. 07/10/2024 15:30:50 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. 07/10/2024 15:30:50 - INFO - llamafactory.model.adapter - Fine-tuning method: Full 07/10/2024 15:30:50 - INFO - llamafactory.model.loader - trainable params: 8,030,261,248 || all params: 8,030,261,248 || trainable%: 100.0000 07/10/2024 15:30:50 - INFO - llamafactory.model.loader - trainable params: 8,030,261,248 || all params: 8,030,261,248 || trainable%: 100.0000 07/10/2024 15:30:51 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled. 07/10/2024 15:30:51 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference. 07/10/2024 15:30:51 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. 07/10/2024 15:30:51 - INFO - llamafactory.model.adapter - Fine-tuning method: Full 07/10/2024 15:30:51 - INFO - llamafactory.model.loader - trainable params: 8,030,261,248 || all params: 8,030,261,248 || trainable%: 100.0000 07/10/2024 15:30:51 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled. 07/10/2024 15:30:51 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference. 07/10/2024 15:30:51 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. 07/10/2024 15:30:51 - INFO - llamafactory.model.adapter - Fine-tuning method: Full 07/10/2024 15:30:51 - INFO - llamafactory.model.loader - trainable params: 8,030,261,248 || all params: 8,030,261,248 || trainable%: 100.0000 [INFO|trainer.py:2128] 2024-07-10 15:31:12,003 >> ***** Running training ***** [INFO|trainer.py:2129] 2024-07-10 15:31:12,003 >> Num examples = 19,880 [INFO|trainer.py:2130] 2024-07-10 15:31:12,003 >> Num Epochs = 5 [INFO|trainer.py:2131] 2024-07-10 15:31:12,003 >> Instantaneous batch size per device = 4 [INFO|trainer.py:2134] 2024-07-10 15:31:12,003 >> Total train batch size (w. parallel, distributed & accumulation) = 256 [INFO|trainer.py:2135] 2024-07-10 15:31:12,003 >> Gradient Accumulation steps = 8 [INFO|trainer.py:2136] 2024-07-10 15:31:12,003 >> Total optimization steps = 385 [INFO|trainer.py:2137] 2024-07-10 15:31:12,004 >> Number of trainable parameters = 8,030,261,248 [INFO|callbacks.py:310] 2024-07-10 15:31:28,728 >> {'loss': 7.8034, 'learning_rate': 8.3333e-09, 'epoch': 0.01, 'throughput': 803.70} [INFO|callbacks.py:310] 2024-07-10 15:31:41,798 >> {'loss': 7.7577, 'learning_rate': 1.6667e-08, 'epoch': 0.03, 'throughput': 905.46} [INFO|callbacks.py:310] 2024-07-10 15:31:54,840 >> {'loss': 7.8132, 'learning_rate': 2.5000e-08, 'epoch': 0.04, 'throughput': 952.51} [INFO|callbacks.py:310] 2024-07-10 15:32:07,885 >> {'loss': 7.7906, 'learning_rate': 3.3333e-08, 'epoch': 0.05, 'throughput': 967.80} [INFO|callbacks.py:310] 2024-07-10 15:32:20,925 >> {'loss': 7.7763, 'learning_rate': 4.1667e-08, 'epoch': 0.06, 'throughput': 980.16} [INFO|callbacks.py:310] 2024-07-10 15:32:33,999 >> {'loss': 7.7824, 'learning_rate': 5.0000e-08, 'epoch': 0.08, 'throughput': 995.20} [INFO|callbacks.py:310] 2024-07-10 15:32:47,054 >> {'loss': 7.8330, 'learning_rate': 5.8333e-08, 'epoch': 0.09, 'throughput': 996.54} [INFO|callbacks.py:310] 2024-07-10 15:33:00,116 >> {'loss': 7.6949, 'learning_rate': 6.6667e-08, 'epoch': 0.10, 'throughput': 994.24} [INFO|callbacks.py:310] 2024-07-10 15:33:13,214 >> {'loss': 7.8336, 'learning_rate': 7.5000e-08, 'epoch': 0.12, 'throughput': 998.22} [INFO|callbacks.py:310] 2024-07-10 15:33:26,241 >> {'loss': 7.7282, 'learning_rate': 8.3333e-08, 'epoch': 0.13, 'throughput': 1000.27} [INFO|callbacks.py:310] 2024-07-10 15:33:39,293 >> {'loss': 7.6916, 'learning_rate': 9.1667e-08, 'epoch': 0.14, 'throughput': 1005.49} [INFO|callbacks.py:310] 2024-07-10 15:33:52,303 >> {'loss': 7.7333, 'learning_rate': 1.0000e-07, 'epoch': 0.15, 'throughput': 1002.74} [INFO|callbacks.py:310] 2024-07-10 15:34:05,346 >> {'loss': 7.6017, 'learning_rate': 1.0833e-07, 'epoch': 0.17, 'throughput': 1000.21} [INFO|callbacks.py:310] 2024-07-10 15:34:18,384 >> {'loss': 7.6440, 'learning_rate': 1.1667e-07, 'epoch': 0.18, 'throughput': 1004.07} [INFO|callbacks.py:310] 2024-07-10 15:34:31,502 >> {'loss': 7.5965, 'learning_rate': 1.2500e-07, 'epoch': 0.19, 'throughput': 1003.49} [INFO|callbacks.py:310] 2024-07-10 15:34:44,580 >> {'loss': 7.5883, 'learning_rate': 1.3333e-07, 'epoch': 0.21, 'throughput': 1006.03} [INFO|callbacks.py:310] 2024-07-10 15:34:57,629 >> {'loss': 7.2464, 'learning_rate': 1.4167e-07, 'epoch': 0.22, 'throughput': 1005.85} [INFO|callbacks.py:310] 2024-07-10 15:35:10,663 >> {'loss': 7.3133, 'learning_rate': 1.5000e-07, 'epoch': 0.23, 'throughput': 1008.31} [INFO|callbacks.py:310] 2024-07-10 15:35:23,706 >> {'loss': 7.2133, 'learning_rate': 1.5833e-07, 'epoch': 0.24, 'throughput': 1006.28} [INFO|callbacks.py:310] 2024-07-10 15:35:36,722 >> {'loss': 7.2431, 'learning_rate': 1.6667e-07, 'epoch': 0.26, 'throughput': 1006.36} [INFO|callbacks.py:310] 2024-07-10 15:35:49,807 >> {'loss': 7.1875, 'learning_rate': 1.7500e-07, 'epoch': 0.27, 'throughput': 1009.41} [INFO|callbacks.py:310] 2024-07-10 15:36:02,888 >> {'loss': 7.0659, 'learning_rate': 1.8333e-07, 'epoch': 0.28, 'throughput': 1011.98} [INFO|callbacks.py:310] 2024-07-10 15:36:16,015 >> {'loss': 6.3595, 'learning_rate': 1.9167e-07, 'epoch': 0.30, 'throughput': 1012.60} [INFO|callbacks.py:310] 2024-07-10 15:36:29,093 >> {'loss': 6.0417, 'learning_rate': 2.0000e-07, 'epoch': 0.31, 'throughput': 1015.95} [INFO|callbacks.py:310] 2024-07-10 15:36:42,120 >> {'loss': 5.9894, 'learning_rate': 2.0833e-07, 'epoch': 0.32, 'throughput': 1016.08} [INFO|callbacks.py:310] 2024-07-10 15:36:55,176 >> {'loss': 5.9259, 'learning_rate': 2.1667e-07, 'epoch': 0.33, 'throughput': 1018.08} [INFO|callbacks.py:310] 2024-07-10 15:37:08,224 >> {'loss': 5.8983, 'learning_rate': 2.2500e-07, 'epoch': 0.35, 'throughput': 1018.88} [INFO|callbacks.py:310] 2024-07-10 15:37:21,315 >> {'loss': 5.6848, 'learning_rate': 2.3333e-07, 'epoch': 0.36, 'throughput': 1019.07} [INFO|callbacks.py:310] 2024-07-10 15:37:34,363 >> {'loss': 5.5649, 'learning_rate': 2.4167e-07, 'epoch': 0.37, 'throughput': 1018.19} [INFO|callbacks.py:310] 2024-07-10 15:37:47,442 >> {'loss': 5.4642, 'learning_rate': 2.5000e-07, 'epoch': 0.39, 'throughput': 1018.74} [INFO|callbacks.py:310] 2024-07-10 15:38:00,541 >> {'loss': 4.7955, 'learning_rate': 2.5833e-07, 'epoch': 0.40, 'throughput': 1019.13} [INFO|callbacks.py:310] 2024-07-10 15:38:13,587 >> {'loss': 2.8339, 'learning_rate': 2.6667e-07, 'epoch': 0.41, 'throughput': 1019.78} [INFO|callbacks.py:310] 2024-07-10 15:38:26,680 >> {'loss': 2.4477, 'learning_rate': 2.7500e-07, 'epoch': 0.42, 'throughput': 1021.60} [INFO|callbacks.py:310] 2024-07-10 15:38:39,757 >> {'loss': 2.3331, 'learning_rate': 2.8333e-07, 'epoch': 0.44, 'throughput': 1023.57} [INFO|callbacks.py:310] 2024-07-10 15:38:52,796 >> {'loss': 2.2143, 'learning_rate': 2.9167e-07, 'epoch': 0.45, 'throughput': 1024.12} [INFO|callbacks.py:310] 2024-07-10 15:39:05,831 >> {'loss': 2.0067, 'learning_rate': 3.0000e-07, 'epoch': 0.46, 'throughput': 1023.09} [INFO|callbacks.py:310] 2024-07-10 15:39:18,925 >> {'loss': 1.7702, 'learning_rate': 3.0833e-07, 'epoch': 0.48, 'throughput': 1022.13} [INFO|callbacks.py:310] 2024-07-10 15:39:31,996 >> {'loss': 1.5557, 'learning_rate': 3.1667e-07, 'epoch': 0.49, 'throughput': 1022.16} [INFO|callbacks.py:310] 2024-07-10 15:39:45,035 >> {'loss': 1.3024, 'learning_rate': 3.2500e-07, 'epoch': 0.50, 'throughput': 1022.51} [INFO|callbacks.py:310] 2024-07-10 15:39:58,045 >> {'loss': 1.1652, 'learning_rate': 3.3333e-07, 'epoch': 0.51, 'throughput': 1022.95} [INFO|callbacks.py:310] 2024-07-10 15:40:11,137 >> {'loss': 0.6839, 'learning_rate': 3.4167e-07, 'epoch': 0.53, 'throughput': 1024.05} [INFO|callbacks.py:310] 2024-07-10 15:40:24,186 >> {'loss': 0.4774, 'learning_rate': 3.5000e-07, 'epoch': 0.54, 'throughput': 1024.88} [INFO|callbacks.py:310] 2024-07-10 15:40:37,254 >> {'loss': 0.3841, 'learning_rate': 3.5833e-07, 'epoch': 0.55, 'throughput': 1024.68} [INFO|callbacks.py:310] 2024-07-10 15:40:50,317 >> {'loss': 0.3588, 'learning_rate': 3.6667e-07, 'epoch': 0.57, 'throughput': 1025.16} [INFO|callbacks.py:310] 2024-07-10 15:41:03,428 >> {'loss': 0.3628, 'learning_rate': 3.7500e-07, 'epoch': 0.58, 'throughput': 1025.76} [INFO|callbacks.py:310] 2024-07-10 15:41:16,474 >> {'loss': 0.3426, 'learning_rate': 3.8333e-07, 'epoch': 0.59, 'throughput': 1025.64} [INFO|callbacks.py:310] 2024-07-10 15:41:29,543 >> {'loss': 0.3279, 'learning_rate': 3.9167e-07, 'epoch': 0.60, 'throughput': 1026.06} [INFO|callbacks.py:310] 2024-07-10 15:41:42,615 >> {'loss': 0.3947, 'learning_rate': 4.0000e-07, 'epoch': 0.62, 'throughput': 1026.56} [INFO|callbacks.py:310] 2024-07-10 15:41:55,704 >> {'loss': 0.3075, 'learning_rate': 4.0833e-07, 'epoch': 0.63, 'throughput': 1027.66} [INFO|callbacks.py:310] 2024-07-10 15:42:08,756 >> {'loss': 0.3236, 'learning_rate': 4.1667e-07, 'epoch': 0.64, 'throughput': 1028.43} [INFO|callbacks.py:310] 2024-07-10 15:42:21,803 >> {'loss': 0.3557, 'learning_rate': 4.2500e-07, 'epoch': 0.66, 'throughput': 1028.37} [INFO|callbacks.py:310] 2024-07-10 15:42:34,889 >> {'loss': 0.4008, 'learning_rate': 4.3333e-07, 'epoch': 0.67, 'throughput': 1028.30} [INFO|callbacks.py:310] 2024-07-10 15:42:47,969 >> {'loss': 0.3586, 'learning_rate': 4.4167e-07, 'epoch': 0.68, 'throughput': 1028.93} [INFO|callbacks.py:310] 2024-07-10 15:43:00,979 >> {'loss': 0.3023, 'learning_rate': 4.5000e-07, 'epoch': 0.69, 'throughput': 1027.78} [INFO|callbacks.py:310] 2024-07-10 15:43:14,044 >> {'loss': 0.3547, 'learning_rate': 4.5833e-07, 'epoch': 0.71, 'throughput': 1028.91} [INFO|callbacks.py:310] 2024-07-10 15:43:27,066 >> {'loss': 0.3846, 'learning_rate': 4.6667e-07, 'epoch': 0.72, 'throughput': 1028.23} [INFO|callbacks.py:310] 2024-07-10 15:43:40,122 >> {'loss': 0.3743, 'learning_rate': 4.7500e-07, 'epoch': 0.73, 'throughput': 1028.85} [INFO|callbacks.py:310] 2024-07-10 15:43:53,200 >> {'loss': 0.3091, 'learning_rate': 4.8333e-07, 'epoch': 0.75, 'throughput': 1029.54} [INFO|callbacks.py:310] 2024-07-10 15:44:06,285 >> {'loss': 0.3094, 'learning_rate': 4.9167e-07, 'epoch': 0.76, 'throughput': 1028.92} [INFO|callbacks.py:310] 2024-07-10 15:44:19,363 >> {'loss': 0.3309, 'learning_rate': 5.0000e-07, 'epoch': 0.77, 'throughput': 1029.14} [INFO|callbacks.py:310] 2024-07-10 15:44:32,413 >> {'loss': 0.3276, 'learning_rate': 5.0833e-07, 'epoch': 0.78, 'throughput': 1029.68} [INFO|callbacks.py:310] 2024-07-10 15:44:45,451 >> {'loss': 0.3084, 'learning_rate': 5.1667e-07, 'epoch': 0.80, 'throughput': 1029.34} [INFO|callbacks.py:310] 2024-07-10 15:44:58,517 >> {'loss': 0.3182, 'learning_rate': 5.2500e-07, 'epoch': 0.81, 'throughput': 1029.83} [INFO|callbacks.py:310] 2024-07-10 15:45:11,542 >> {'loss': 0.3469, 'learning_rate': 5.3333e-07, 'epoch': 0.82, 'throughput': 1029.25} [INFO|callbacks.py:310] 2024-07-10 15:45:24,606 >> {'loss': 0.3253, 'learning_rate': 5.4167e-07, 'epoch': 0.84, 'throughput': 1029.85} [INFO|callbacks.py:310] 2024-07-10 15:45:37,676 >> {'loss': 0.2746, 'learning_rate': 5.5000e-07, 'epoch': 0.85, 'throughput': 1030.30} [INFO|callbacks.py:310] 2024-07-10 15:45:50,772 >> {'loss': 0.2893, 'learning_rate': 5.5833e-07, 'epoch': 0.86, 'throughput': 1031.01} [INFO|callbacks.py:310] 2024-07-10 15:46:03,815 >> {'loss': 0.2827, 'learning_rate': 5.6667e-07, 'epoch': 0.87, 'throughput': 1030.43} [INFO|callbacks.py:310] 2024-07-10 15:46:16,892 >> {'loss': 0.2978, 'learning_rate': 5.7500e-07, 'epoch': 0.89, 'throughput': 1030.71} [INFO|callbacks.py:310] 2024-07-10 15:46:29,914 >> {'loss': 0.2703, 'learning_rate': 5.8333e-07, 'epoch': 0.90, 'throughput': 1029.78} [INFO|callbacks.py:310] 2024-07-10 15:46:42,975 >> {'loss': 0.2968, 'learning_rate': 5.9167e-07, 'epoch': 0.91, 'throughput': 1029.88} [INFO|callbacks.py:310] 2024-07-10 15:46:56,043 >> {'loss': 0.3035, 'learning_rate': 6.0000e-07, 'epoch': 0.93, 'throughput': 1030.06} [INFO|callbacks.py:310] 2024-07-10 15:47:09,118 >> {'loss': 0.3211, 'learning_rate': 6.0833e-07, 'epoch': 0.94, 'throughput': 1030.07} [INFO|callbacks.py:310] 2024-07-10 15:47:22,193 >> {'loss': 0.2913, 'learning_rate': 6.1667e-07, 'epoch': 0.95, 'throughput': 1030.00} [INFO|callbacks.py:310] 2024-07-10 15:47:35,264 >> {'loss': 0.2817, 'learning_rate': 6.2500e-07, 'epoch': 0.96, 'throughput': 1029.88} [INFO|callbacks.py:310] 2024-07-10 15:47:48,300 >> {'loss': 0.2827, 'learning_rate': 6.3333e-07, 'epoch': 0.98, 'throughput': 1029.93} [INFO|callbacks.py:310] 2024-07-10 15:48:01,345 >> {'loss': 0.2290, 'learning_rate': 6.4167e-07, 'epoch': 0.99, 'throughput': 1029.81} [INFO|callbacks.py:310] 2024-07-10 15:48:14,403 >> {'loss': 0.2503, 'learning_rate': 6.5000e-07, 'epoch': 1.00, 'throughput': 1030.33} [INFO|callbacks.py:310] 2024-07-10 15:48:27,482 >> {'loss': 0.2453, 'learning_rate': 6.5833e-07, 'epoch': 1.02, 'throughput': 1030.76} [INFO|callbacks.py:310] 2024-07-10 15:48:40,533 >> {'loss': 0.2167, 'learning_rate': 6.6667e-07, 'epoch': 1.03, 'throughput': 1030.29} [INFO|callbacks.py:310] 2024-07-10 15:48:53,615 >> {'loss': 0.2361, 'learning_rate': 6.7500e-07, 'epoch': 1.04, 'throughput': 1030.47} [INFO|callbacks.py:310] 2024-07-10 15:49:06,694 >> {'loss': 0.2248, 'learning_rate': 6.8333e-07, 'epoch': 1.05, 'throughput': 1030.91} [INFO|callbacks.py:310] 2024-07-10 15:49:19,720 >> {'loss': 0.2491, 'learning_rate': 6.9167e-07, 'epoch': 1.07, 'throughput': 1030.71} [INFO|callbacks.py:310] 2024-07-10 15:49:32,815 >> {'loss': 0.2352, 'learning_rate': 7.0000e-07, 'epoch': 1.08, 'throughput': 1031.10} [INFO|callbacks.py:310] 2024-07-10 15:49:45,904 >> {'loss': 0.2365, 'learning_rate': 7.0833e-07, 'epoch': 1.09, 'throughput': 1031.51} [INFO|callbacks.py:310] 2024-07-10 15:49:58,930 >> {'loss': 0.2170, 'learning_rate': 7.1667e-07, 'epoch': 1.11, 'throughput': 1031.65} [INFO|callbacks.py:310] 2024-07-10 15:50:11,963 >> {'loss': 0.2258, 'learning_rate': 7.2500e-07, 'epoch': 1.12, 'throughput': 1030.83} [INFO|callbacks.py:310] 2024-07-10 15:50:25,032 >> {'loss': 0.2450, 'learning_rate': 7.3333e-07, 'epoch': 1.13, 'throughput': 1031.53} [INFO|callbacks.py:310] 2024-07-10 15:50:38,116 >> {'loss': 0.3132, 'learning_rate': 7.4167e-07, 'epoch': 1.14, 'throughput': 1031.26} [INFO|callbacks.py:310] 2024-07-10 15:50:51,182 >> {'loss': 0.2840, 'learning_rate': 7.5000e-07, 'epoch': 1.16, 'throughput': 1031.93} [INFO|callbacks.py:310] 2024-07-10 15:51:04,209 >> {'loss': 0.1933, 'learning_rate': 7.5833e-07, 'epoch': 1.17, 'throughput': 1032.15} [INFO|callbacks.py:310] 2024-07-10 15:51:17,279 >> {'loss': 0.2154, 'learning_rate': 7.6667e-07, 'epoch': 1.18, 'throughput': 1032.42} [INFO|callbacks.py:310] 2024-07-10 15:51:30,278 >> {'loss': 0.2064, 'learning_rate': 7.7500e-07, 'epoch': 1.20, 'throughput': 1032.18} [INFO|callbacks.py:310] 2024-07-10 15:51:43,357 >> {'loss': 0.2038, 'learning_rate': 7.8333e-07, 'epoch': 1.21, 'throughput': 1032.52} [INFO|callbacks.py:310] 2024-07-10 15:51:56,455 >> {'loss': 0.2152, 'learning_rate': 7.9167e-07, 'epoch': 1.22, 'throughput': 1033.74} [INFO|callbacks.py:310] 2024-07-10 15:52:09,536 >> {'loss': 0.1961, 'learning_rate': 8.0000e-07, 'epoch': 1.23, 'throughput': 1033.85} [INFO|callbacks.py:310] 2024-07-10 15:52:22,616 >> {'loss': 0.1772, 'learning_rate': 8.0833e-07, 'epoch': 1.25, 'throughput': 1033.81} [INFO|callbacks.py:310] 2024-07-10 15:52:35,723 >> {'loss': 0.1846, 'learning_rate': 8.1667e-07, 'epoch': 1.26, 'throughput': 1033.97} [INFO|callbacks.py:310] 2024-07-10 15:52:48,763 >> {'loss': 0.1823, 'learning_rate': 8.2500e-07, 'epoch': 1.27, 'throughput': 1034.09} [INFO|callbacks.py:310] 2024-07-10 15:53:01,762 >> {'loss': 0.1794, 'learning_rate': 8.3333e-07, 'epoch': 1.29, 'throughput': 1033.35} [INFO|callbacks.py:310] 2024-07-10 15:53:14,835 >> {'loss': 0.2106, 'learning_rate': 8.4167e-07, 'epoch': 1.30, 'throughput': 1033.90} [INFO|callbacks.py:310] 2024-07-10 15:53:27,876 >> {'loss': 0.2123, 'learning_rate': 8.5000e-07, 'epoch': 1.31, 'throughput': 1033.68} [INFO|callbacks.py:310] 2024-07-10 15:53:40,937 >> {'loss': 0.2413, 'learning_rate': 8.5833e-07, 'epoch': 1.32, 'throughput': 1033.33} [INFO|callbacks.py:310] 2024-07-10 15:53:54,008 >> {'loss': 0.2334, 'learning_rate': 8.6667e-07, 'epoch': 1.34, 'throughput': 1032.97} [INFO|callbacks.py:310] 2024-07-10 15:54:07,043 >> {'loss': 0.2069, 'learning_rate': 8.7500e-07, 'epoch': 1.35, 'throughput': 1032.70} [INFO|callbacks.py:310] 2024-07-10 15:54:20,100 >> {'loss': 0.2262, 'learning_rate': 8.8333e-07, 'epoch': 1.36, 'throughput': 1032.67} [INFO|callbacks.py:310] 2024-07-10 15:54:33,175 >> {'loss': 0.1718, 'learning_rate': 8.9167e-07, 'epoch': 1.38, 'throughput': 1032.39} [INFO|callbacks.py:310] 2024-07-10 15:54:46,250 >> {'loss': 0.2040, 'learning_rate': 9.0000e-07, 'epoch': 1.39, 'throughput': 1032.58} [INFO|callbacks.py:310] 2024-07-10 15:54:59,311 >> {'loss': 0.1849, 'learning_rate': 9.0833e-07, 'epoch': 1.40, 'throughput': 1032.91} [INFO|callbacks.py:310] 2024-07-10 15:55:12,379 >> {'loss': 0.2028, 'learning_rate': 9.1667e-07, 'epoch': 1.41, 'throughput': 1033.00} [INFO|callbacks.py:310] 2024-07-10 15:55:25,480 >> {'loss': 0.1790, 'learning_rate': 9.2500e-07, 'epoch': 1.43, 'throughput': 1033.15} [INFO|callbacks.py:310] 2024-07-10 15:55:38,555 >> {'loss': 0.1813, 'learning_rate': 9.3333e-07, 'epoch': 1.44, 'throughput': 1033.22} [INFO|callbacks.py:310] 2024-07-10 15:55:51,609 >> {'loss': 0.1955, 'learning_rate': 9.4167e-07, 'epoch': 1.45, 'throughput': 1033.14} [INFO|callbacks.py:310] 2024-07-10 15:56:04,654 >> {'loss': 0.1577, 'learning_rate': 9.5000e-07, 'epoch': 1.47, 'throughput': 1032.82} [INFO|callbacks.py:310] 2024-07-10 15:56:17,693 >> {'loss': 0.1509, 'learning_rate': 9.5833e-07, 'epoch': 1.48, 'throughput': 1032.48} [INFO|callbacks.py:310] 2024-07-10 15:56:30,731 >> {'loss': 0.2052, 'learning_rate': 9.6667e-07, 'epoch': 1.49, 'throughput': 1031.98} [INFO|callbacks.py:310] 2024-07-10 15:56:43,807 >> {'loss': 0.1576, 'learning_rate': 9.7500e-07, 'epoch': 1.50, 'throughput': 1031.99} [INFO|callbacks.py:310] 2024-07-10 15:56:56,876 >> {'loss': 0.1459, 'learning_rate': 9.8333e-07, 'epoch': 1.52, 'throughput': 1031.67} [INFO|callbacks.py:310] 2024-07-10 15:57:09,932 >> {'loss': 0.2694, 'learning_rate': 9.9167e-07, 'epoch': 1.53, 'throughput': 1031.92} [INFO|callbacks.py:310] 2024-07-10 15:57:22,991 >> {'loss': 0.1891, 'learning_rate': 1.0000e-06, 'epoch': 1.54, 'throughput': 1031.95} [INFO|callbacks.py:310] 2024-07-10 15:57:36,004 >> {'loss': 0.1655, 'learning_rate': 1.0083e-06, 'epoch': 1.56, 'throughput': 1031.72} [INFO|callbacks.py:310] 2024-07-10 15:57:49,058 >> {'loss': 0.1534, 'learning_rate': 1.0167e-06, 'epoch': 1.57, 'throughput': 1031.72} [INFO|callbacks.py:310] 2024-07-10 15:58:02,124 >> {'loss': 0.1373, 'learning_rate': 1.0250e-06, 'epoch': 1.58, 'throughput': 1031.81} [INFO|callbacks.py:310] 2024-07-10 15:58:15,198 >> {'loss': 0.1528, 'learning_rate': 1.0333e-06, 'epoch': 1.59, 'throughput': 1031.84} [INFO|callbacks.py:310] 2024-07-10 15:58:28,311 >> {'loss': 0.2017, 'learning_rate': 1.0417e-06, 'epoch': 1.61, 'throughput': 1032.20} [INFO|callbacks.py:310] 2024-07-10 15:58:41,392 >> {'loss': 0.1554, 'learning_rate': 1.0500e-06, 'epoch': 1.62, 'throughput': 1032.47} [INFO|callbacks.py:310] 2024-07-10 15:58:54,437 >> {'loss': 0.1332, 'learning_rate': 1.0583e-06, 'epoch': 1.63, 'throughput': 1032.68} [INFO|callbacks.py:310] 2024-07-10 15:59:07,498 >> {'loss': 0.1150, 'learning_rate': 1.0667e-06, 'epoch': 1.65, 'throughput': 1033.04} [INFO|callbacks.py:310] 2024-07-10 15:59:20,535 >> {'loss': 0.1190, 'learning_rate': 1.0750e-06, 'epoch': 1.66, 'throughput': 1032.57} [INFO|callbacks.py:310] 2024-07-10 15:59:33,580 >> {'loss': 0.1164, 'learning_rate': 1.0833e-06, 'epoch': 1.67, 'throughput': 1032.98} [INFO|callbacks.py:310] 2024-07-10 15:59:46,641 >> {'loss': 0.1981, 'learning_rate': 1.0917e-06, 'epoch': 1.68, 'throughput': 1033.03} [INFO|callbacks.py:310] 2024-07-10 15:59:59,719 >> {'loss': 0.1680, 'learning_rate': 1.1000e-06, 'epoch': 1.70, 'throughput': 1032.86} [INFO|callbacks.py:310] 2024-07-10 16:00:12,811 >> {'loss': 0.0741, 'learning_rate': 1.1083e-06, 'epoch': 1.71, 'throughput': 1032.96} [INFO|callbacks.py:310] 2024-07-10 16:00:25,847 >> {'loss': 0.1847, 'learning_rate': 1.1167e-06, 'epoch': 1.72, 'throughput': 1032.74} [INFO|callbacks.py:310] 2024-07-10 16:00:38,940 >> {'loss': 0.1080, 'learning_rate': 1.1250e-06, 'epoch': 1.74, 'throughput': 1032.88} [INFO|callbacks.py:310] 2024-07-10 16:00:51,967 >> {'loss': 0.1214, 'learning_rate': 1.1333e-06, 'epoch': 1.75, 'throughput': 1032.96} [INFO|callbacks.py:310] 2024-07-10 16:01:05,023 >> {'loss': 0.1252, 'learning_rate': 1.1417e-06, 'epoch': 1.76, 'throughput': 1033.06} [INFO|callbacks.py:310] 2024-07-10 16:01:18,067 >> {'loss': 0.1440, 'learning_rate': 1.1500e-06, 'epoch': 1.77, 'throughput': 1032.84} [INFO|callbacks.py:310] 2024-07-10 16:01:31,137 >> {'loss': 0.1269, 'learning_rate': 1.1583e-06, 'epoch': 1.79, 'throughput': 1032.79} [INFO|callbacks.py:310] 2024-07-10 16:01:44,213 >> {'loss': 0.1283, 'learning_rate': 1.1667e-06, 'epoch': 1.80, 'throughput': 1032.81} [INFO|callbacks.py:310] 2024-07-10 16:01:57,241 >> {'loss': 0.0929, 'learning_rate': 1.1750e-06, 'epoch': 1.81, 'throughput': 1032.78} [INFO|callbacks.py:310] 2024-07-10 16:02:10,258 >> {'loss': 0.1349, 'learning_rate': 1.1833e-06, 'epoch': 1.83, 'throughput': 1032.56} [INFO|callbacks.py:310] 2024-07-10 16:02:23,316 >> {'loss': 0.1277, 'learning_rate': 1.1917e-06, 'epoch': 1.84, 'throughput': 1032.33} [INFO|callbacks.py:310] 2024-07-10 16:02:36,363 >> {'loss': 0.1585, 'learning_rate': 1.2000e-06, 'epoch': 1.85, 'throughput': 1032.35} [INFO|callbacks.py:310] 2024-07-10 16:02:49,387 >> {'loss': 0.1468, 'learning_rate': 1.2083e-06, 'epoch': 1.86, 'throughput': 1031.96} [INFO|callbacks.py:310] 2024-07-10 16:03:02,421 >> {'loss': 0.1049, 'learning_rate': 1.2167e-06, 'epoch': 1.88, 'throughput': 1031.92} [INFO|callbacks.py:310] 2024-07-10 16:03:15,527 >> {'loss': 0.1297, 'learning_rate': 1.2250e-06, 'epoch': 1.89, 'throughput': 1031.96} [INFO|callbacks.py:310] 2024-07-10 16:03:28,588 >> {'loss': 0.1111, 'learning_rate': 1.2333e-06, 'epoch': 1.90, 'throughput': 1031.79} [INFO|callbacks.py:310] 2024-07-10 16:03:41,633 >> {'loss': 0.1202, 'learning_rate': 1.2417e-06, 'epoch': 1.92, 'throughput': 1032.01} [INFO|callbacks.py:310] 2024-07-10 16:03:54,708 >> {'loss': 0.0829, 'learning_rate': 1.2500e-06, 'epoch': 1.93, 'throughput': 1032.05} [INFO|callbacks.py:310] 2024-07-10 16:04:07,790 >> {'loss': 0.1119, 'learning_rate': 1.2583e-06, 'epoch': 1.94, 'throughput': 1032.55} [INFO|callbacks.py:310] 2024-07-10 16:04:20,848 >> {'loss': 0.1144, 'learning_rate': 1.2667e-06, 'epoch': 1.95, 'throughput': 1032.51} [INFO|callbacks.py:310] 2024-07-10 16:04:33,876 >> {'loss': 0.1170, 'learning_rate': 1.2750e-06, 'epoch': 1.97, 'throughput': 1031.96} [INFO|callbacks.py:310] 2024-07-10 16:04:46,937 >> {'loss': 0.0998, 'learning_rate': 1.2833e-06, 'epoch': 1.98, 'throughput': 1031.63} [INFO|callbacks.py:310] 2024-07-10 16:05:00,031 >> {'loss': 0.1384, 'learning_rate': 1.2917e-06, 'epoch': 1.99, 'throughput': 1031.88} [INFO|callbacks.py:310] 2024-07-10 16:05:13,054 >> {'loss': 0.1157, 'learning_rate': 1.3000e-06, 'epoch': 2.01, 'throughput': 1031.69} [INFO|callbacks.py:310] 2024-07-10 16:05:26,109 >> {'loss': 0.0696, 'learning_rate': 1.3083e-06, 'epoch': 2.02, 'throughput': 1031.58} [INFO|callbacks.py:310] 2024-07-10 16:05:39,158 >> {'loss': 0.0665, 'learning_rate': 1.3167e-06, 'epoch': 2.03, 'throughput': 1031.54} [INFO|callbacks.py:310] 2024-07-10 16:05:52,234 >> {'loss': 0.0783, 'learning_rate': 1.3250e-06, 'epoch': 2.05, 'throughput': 1031.64} [INFO|callbacks.py:310] 2024-07-10 16:06:05,266 >> {'loss': 0.0749, 'learning_rate': 1.3333e-06, 'epoch': 2.06, 'throughput': 1031.29} [INFO|callbacks.py:310] 2024-07-10 16:06:18,321 >> {'loss': 0.0731, 'learning_rate': 1.3417e-06, 'epoch': 2.07, 'throughput': 1031.12} [INFO|callbacks.py:310] 2024-07-10 16:06:31,379 >> {'loss': 0.0913, 'learning_rate': 1.3500e-06, 'epoch': 2.08, 'throughput': 1030.87} [INFO|callbacks.py:310] 2024-07-10 16:06:44,451 >> {'loss': 0.0521, 'learning_rate': 1.3583e-06, 'epoch': 2.10, 'throughput': 1030.82} [INFO|callbacks.py:310] 2024-07-10 16:06:57,506 >> {'loss': 0.0680, 'learning_rate': 1.3667e-06, 'epoch': 2.11, 'throughput': 1030.71} [INFO|callbacks.py:310] 2024-07-10 16:07:10,589 >> {'loss': 0.0686, 'learning_rate': 1.3750e-06, 'epoch': 2.12, 'throughput': 1030.53} [INFO|callbacks.py:310] 2024-07-10 16:07:23,614 >> {'loss': 0.0545, 'learning_rate': 1.3833e-06, 'epoch': 2.14, 'throughput': 1030.54} [INFO|callbacks.py:310] 2024-07-10 16:07:36,662 >> {'loss': 0.0347, 'learning_rate': 1.3917e-06, 'epoch': 2.15, 'throughput': 1030.37} [INFO|callbacks.py:310] 2024-07-10 16:07:49,739 >> {'loss': 0.0993, 'learning_rate': 1.4000e-06, 'epoch': 2.16, 'throughput': 1030.36} [INFO|callbacks.py:310] 2024-07-10 16:08:02,845 >> {'loss': 0.1059, 'learning_rate': 1.4083e-06, 'epoch': 2.17, 'throughput': 1030.18} [INFO|callbacks.py:310] 2024-07-10 16:08:15,921 >> {'loss': 0.0890, 'learning_rate': 1.4167e-06, 'epoch': 2.19, 'throughput': 1030.24} [INFO|callbacks.py:310] 2024-07-10 16:08:28,943 >> {'loss': 0.0379, 'learning_rate': 1.4250e-06, 'epoch': 2.20, 'throughput': 1030.07} [INFO|callbacks.py:310] 2024-07-10 16:08:41,971 >> {'loss': 0.0626, 'learning_rate': 1.4333e-06, 'epoch': 2.21, 'throughput': 1030.20} [INFO|callbacks.py:310] 2024-07-10 16:08:55,075 >> {'loss': 0.0957, 'learning_rate': 1.4417e-06, 'epoch': 2.23, 'throughput': 1030.64} [INFO|callbacks.py:310] 2024-07-10 16:09:08,128 >> {'loss': 0.0636, 'learning_rate': 1.4500e-06, 'epoch': 2.24, 'throughput': 1030.75} [INFO|callbacks.py:310] 2024-07-10 16:09:21,197 >> {'loss': 0.0740, 'learning_rate': 1.4583e-06, 'epoch': 2.25, 'throughput': 1030.71} [INFO|callbacks.py:310] 2024-07-10 16:09:34,272 >> {'loss': 0.0685, 'learning_rate': 1.4667e-06, 'epoch': 2.26, 'throughput': 1030.69} [INFO|callbacks.py:310] 2024-07-10 16:09:47,345 >> {'loss': 0.0574, 'learning_rate': 1.4750e-06, 'epoch': 2.28, 'throughput': 1030.53} [INFO|callbacks.py:310] 2024-07-10 16:10:00,383 >> {'loss': 0.0619, 'learning_rate': 1.4833e-06, 'epoch': 2.29, 'throughput': 1030.73} [INFO|callbacks.py:310] 2024-07-10 16:10:13,410 >> {'loss': 0.0683, 'learning_rate': 1.4917e-06, 'epoch': 2.30, 'throughput': 1030.36} [INFO|callbacks.py:310] 2024-07-10 16:10:26,483 >> {'loss': 0.0700, 'learning_rate': 1.5000e-06, 'epoch': 2.32, 'throughput': 1030.33} [INFO|callbacks.py:310] 2024-07-10 16:10:39,532 >> {'loss': 0.1154, 'learning_rate': 1.5083e-06, 'epoch': 2.33, 'throughput': 1030.26} [INFO|callbacks.py:310] 2024-07-10 16:10:52,574 >> {'loss': 0.0923, 'learning_rate': 1.5167e-06, 'epoch': 2.34, 'throughput': 1030.40} [INFO|callbacks.py:310] 2024-07-10 16:11:05,633 >> {'loss': 0.0777, 'learning_rate': 1.5250e-06, 'epoch': 2.35, 'throughput': 1030.14} [INFO|callbacks.py:310] 2024-07-10 16:11:18,722 >> {'loss': 0.0754, 'learning_rate': 1.5333e-06, 'epoch': 2.37, 'throughput': 1030.34} [INFO|callbacks.py:310] 2024-07-10 16:11:31,778 >> {'loss': 0.0704, 'learning_rate': 1.5417e-06, 'epoch': 2.38, 'throughput': 1030.27} [INFO|callbacks.py:310] 2024-07-10 16:11:44,816 >> {'loss': 0.0915, 'learning_rate': 1.5500e-06, 'epoch': 2.39, 'throughput': 1030.38} [INFO|callbacks.py:310] 2024-07-10 16:11:57,871 >> {'loss': 0.0870, 'learning_rate': 1.5583e-06, 'epoch': 2.41, 'throughput': 1030.22} [INFO|callbacks.py:310] 2024-07-10 16:12:10,914 >> {'loss': 0.0566, 'learning_rate': 1.5667e-06, 'epoch': 2.42, 'throughput': 1030.35} [INFO|callbacks.py:310] 2024-07-10 16:12:23,986 >> {'loss': 0.1037, 'learning_rate': 1.5750e-06, 'epoch': 2.43, 'throughput': 1030.56} [INFO|callbacks.py:310] 2024-07-10 16:12:37,049 >> {'loss': 0.1143, 'learning_rate': 1.5833e-06, 'epoch': 2.44, 'throughput': 1030.68} [INFO|callbacks.py:310] 2024-07-10 16:12:50,149 >> {'loss': 0.0829, 'learning_rate': 1.5917e-06, 'epoch': 2.46, 'throughput': 1030.82} [INFO|callbacks.py:310] 2024-07-10 16:13:03,221 >> {'loss': 0.0422, 'learning_rate': 1.6000e-06, 'epoch': 2.47, 'throughput': 1030.87} [INFO|callbacks.py:310] 2024-07-10 16:13:16,270 >> {'loss': 0.0727, 'learning_rate': 1.6083e-06, 'epoch': 2.48, 'throughput': 1030.95} [INFO|callbacks.py:310] 2024-07-10 16:13:29,306 >> {'loss': 0.0836, 'learning_rate': 1.6167e-06, 'epoch': 2.50, 'throughput': 1030.88} [INFO|callbacks.py:310] 2024-07-10 16:13:42,353 >> {'loss': 0.0803, 'learning_rate': 1.6250e-06, 'epoch': 2.51, 'throughput': 1030.85} [INFO|callbacks.py:310] 2024-07-10 16:13:55,421 >> {'loss': 0.0654, 'learning_rate': 1.6333e-06, 'epoch': 2.52, 'throughput': 1030.96} [INFO|callbacks.py:310] 2024-07-10 16:14:08,471 >> {'loss': 0.0587, 'learning_rate': 1.6417e-06, 'epoch': 2.53, 'throughput': 1030.89} [INFO|callbacks.py:310] 2024-07-10 16:14:21,530 >> {'loss': 0.0848, 'learning_rate': 1.6500e-06, 'epoch': 2.55, 'throughput': 1030.87} [INFO|callbacks.py:310] 2024-07-10 16:14:34,591 >> {'loss': 0.0525, 'learning_rate': 1.6583e-06, 'epoch': 2.56, 'throughput': 1030.88} [INFO|callbacks.py:310] 2024-07-10 16:14:47,595 >> {'loss': 0.0677, 'learning_rate': 1.6667e-06, 'epoch': 2.57, 'throughput': 1030.73} [INFO|callbacks.py:310] 2024-07-10 16:15:00,635 >> {'loss': 0.0620, 'learning_rate': 1.6750e-06, 'epoch': 2.59, 'throughput': 1030.77} [INFO|callbacks.py:310] 2024-07-10 16:15:13,672 >> {'loss': 0.0674, 'learning_rate': 1.6833e-06, 'epoch': 2.60, 'throughput': 1030.89} [INFO|callbacks.py:310] 2024-07-10 16:15:26,718 >> {'loss': 0.0533, 'learning_rate': 1.6917e-06, 'epoch': 2.61, 'throughput': 1030.91} [INFO|callbacks.py:310] 2024-07-10 16:15:39,798 >> {'loss': 0.0757, 'learning_rate': 1.7000e-06, 'epoch': 2.62, 'throughput': 1031.12} [INFO|callbacks.py:310] 2024-07-10 16:15:52,886 >> {'loss': 0.0777, 'learning_rate': 1.7083e-06, 'epoch': 2.64, 'throughput': 1031.22} [INFO|callbacks.py:310] 2024-07-10 16:16:05,977 >> {'loss': 0.0921, 'learning_rate': 1.7167e-06, 'epoch': 2.65, 'throughput': 1031.22} [INFO|callbacks.py:310] 2024-07-10 16:16:19,012 >> {'loss': 0.0378, 'learning_rate': 1.7250e-06, 'epoch': 2.66, 'throughput': 1031.02} [INFO|callbacks.py:310] 2024-07-10 16:16:32,072 >> {'loss': 0.0671, 'learning_rate': 1.7333e-06, 'epoch': 2.68, 'throughput': 1031.28} [INFO|callbacks.py:310] 2024-07-10 16:16:45,154 >> {'loss': 0.0664, 'learning_rate': 1.7417e-06, 'epoch': 2.69, 'throughput': 1031.30} [INFO|callbacks.py:310] 2024-07-10 16:16:58,186 >> {'loss': 0.0720, 'learning_rate': 1.7500e-06, 'epoch': 2.70, 'throughput': 1031.20} [INFO|callbacks.py:310] 2024-07-10 16:17:11,271 >> {'loss': 0.0883, 'learning_rate': 1.7583e-06, 'epoch': 2.71, 'throughput': 1031.43} [INFO|callbacks.py:310] 2024-07-10 16:17:24,320 >> {'loss': 0.0414, 'learning_rate': 1.7667e-06, 'epoch': 2.73, 'throughput': 1031.35} [INFO|callbacks.py:310] 2024-07-10 16:17:37,394 >> {'loss': 0.0310, 'learning_rate': 1.7750e-06, 'epoch': 2.74, 'throughput': 1031.16} [INFO|callbacks.py:310] 2024-07-10 16:17:50,443 >> {'loss': 0.0634, 'learning_rate': 1.7833e-06, 'epoch': 2.75, 'throughput': 1031.04} [INFO|callbacks.py:310] 2024-07-10 16:18:03,511 >> {'loss': 0.0837, 'learning_rate': 1.7917e-06, 'epoch': 2.77, 'throughput': 1031.15} [INFO|callbacks.py:310] 2024-07-10 16:18:16,574 >> {'loss': 0.0855, 'learning_rate': 1.8000e-06, 'epoch': 2.78, 'throughput': 1031.11} [INFO|callbacks.py:310] 2024-07-10 16:18:29,592 >> {'loss': 0.0945, 'learning_rate': 1.8083e-06, 'epoch': 2.79, 'throughput': 1030.85} [INFO|callbacks.py:310] 2024-07-10 16:18:42,612 >> {'loss': 0.0780, 'learning_rate': 1.8167e-06, 'epoch': 2.80, 'throughput': 1030.78} [INFO|callbacks.py:310] 2024-07-10 16:18:55,683 >> {'loss': 0.0573, 'learning_rate': 1.8250e-06, 'epoch': 2.82, 'throughput': 1030.59} [INFO|callbacks.py:310] 2024-07-10 16:19:08,787 >> {'loss': 0.0806, 'learning_rate': 1.8333e-06, 'epoch': 2.83, 'throughput': 1030.71} [INFO|callbacks.py:310] 2024-07-10 16:19:21,866 >> {'loss': 0.0961, 'learning_rate': 1.8417e-06, 'epoch': 2.84, 'throughput': 1030.63} [INFO|callbacks.py:310] 2024-07-10 16:19:34,886 >> {'loss': 0.0732, 'learning_rate': 1.8500e-06, 'epoch': 2.86, 'throughput': 1030.70} [INFO|callbacks.py:310] 2024-07-10 16:19:47,985 >> {'loss': 0.0957, 'learning_rate': 1.8583e-06, 'epoch': 2.87, 'throughput': 1030.89} [INFO|callbacks.py:310] 2024-07-10 16:20:01,036 >> {'loss': 0.0774, 'learning_rate': 1.8667e-06, 'epoch': 2.88, 'throughput': 1030.97} [INFO|callbacks.py:310] 2024-07-10 16:20:14,106 >> {'loss': 0.0691, 'learning_rate': 1.8750e-06, 'epoch': 2.89, 'throughput': 1031.09} [INFO|callbacks.py:310] 2024-07-10 16:20:27,180 >> {'loss': 0.0529, 'learning_rate': 1.8833e-06, 'epoch': 2.91, 'throughput': 1031.31} [INFO|callbacks.py:310] 2024-07-10 16:20:40,264 >> {'loss': 0.0811, 'learning_rate': 1.8917e-06, 'epoch': 2.92, 'throughput': 1031.18} [INFO|callbacks.py:310] 2024-07-10 16:20:53,356 >> {'loss': 0.1211, 'learning_rate': 1.9000e-06, 'epoch': 2.93, 'throughput': 1031.32} [INFO|callbacks.py:310] 2024-07-10 16:21:06,425 >> {'loss': 0.0489, 'learning_rate': 1.9083e-06, 'epoch': 2.95, 'throughput': 1031.40} [INFO|callbacks.py:310] 2024-07-10 16:21:19,473 >> {'loss': 0.0947, 'learning_rate': 1.9167e-06, 'epoch': 2.96, 'throughput': 1031.77} [INFO|callbacks.py:310] 2024-07-10 16:21:32,529 >> {'loss': 0.0561, 'learning_rate': 1.9250e-06, 'epoch': 2.97, 'throughput': 1031.72} [INFO|callbacks.py:310] 2024-07-10 16:21:45,617 >> {'loss': 0.0629, 'learning_rate': 1.9333e-06, 'epoch': 2.98, 'throughput': 1031.97} [INFO|callbacks.py:310] 2024-07-10 16:21:58,684 >> {'loss': 0.0579, 'learning_rate': 1.9417e-06, 'epoch': 3.00, 'throughput': 1031.94} [INFO|callbacks.py:310] 2024-07-10 16:22:11,759 >> {'loss': 0.0285, 'learning_rate': 1.9500e-06, 'epoch': 3.01, 'throughput': 1031.94} [INFO|callbacks.py:310] 2024-07-10 16:22:24,852 >> {'loss': 0.0256, 'learning_rate': 1.9583e-06, 'epoch': 3.02, 'throughput': 1031.83} [INFO|callbacks.py:310] 2024-07-10 16:22:37,918 >> {'loss': 0.0247, 'learning_rate': 1.9667e-06, 'epoch': 3.04, 'throughput': 1031.89} [INFO|callbacks.py:310] 2024-07-10 16:22:50,948 >> {'loss': 0.0325, 'learning_rate': 1.9750e-06, 'epoch': 3.05, 'throughput': 1031.71} [INFO|callbacks.py:310] 2024-07-10 16:23:03,975 >> {'loss': 0.0172, 'learning_rate': 1.9833e-06, 'epoch': 3.06, 'throughput': 1031.83} [INFO|callbacks.py:310] 2024-07-10 16:23:17,052 >> {'loss': 0.0500, 'learning_rate': 1.9917e-06, 'epoch': 3.07, 'throughput': 1032.03} [INFO|callbacks.py:310] 2024-07-10 16:23:30,104 >> {'loss': 0.0134, 'learning_rate': 2.0000e-06, 'epoch': 3.09, 'throughput': 1032.07} [INFO|callbacks.py:310] 2024-07-10 16:23:43,194 >> {'loss': 0.0434, 'learning_rate': 2.0083e-06, 'epoch': 3.10, 'throughput': 1032.06} [INFO|callbacks.py:310] 2024-07-10 16:23:56,261 >> {'loss': 0.0186, 'learning_rate': 2.0167e-06, 'epoch': 3.11, 'throughput': 1031.79} [INFO|callbacks.py:310] 2024-07-10 16:24:09,346 >> {'loss': 0.0341, 'learning_rate': 2.0250e-06, 'epoch': 3.13, 'throughput': 1031.70} [INFO|callbacks.py:310] 2024-07-10 16:24:22,384 >> {'loss': 0.0386, 'learning_rate': 2.0333e-06, 'epoch': 3.14, 'throughput': 1031.75} [INFO|callbacks.py:310] 2024-07-10 16:24:35,442 >> {'loss': 0.0389, 'learning_rate': 2.0417e-06, 'epoch': 3.15, 'throughput': 1031.84} [INFO|callbacks.py:310] 2024-07-10 16:24:48,496 >> {'loss': 0.0227, 'learning_rate': 2.0500e-06, 'epoch': 3.16, 'throughput': 1031.98} [INFO|callbacks.py:310] 2024-07-10 16:25:01,559 >> {'loss': 0.0317, 'learning_rate': 2.0583e-06, 'epoch': 3.18, 'throughput': 1032.06} [INFO|callbacks.py:310] 2024-07-10 16:25:14,587 >> {'loss': 0.0335, 'learning_rate': 2.0667e-06, 'epoch': 3.19, 'throughput': 1031.85} [INFO|callbacks.py:310] 2024-07-10 16:25:27,658 >> {'loss': 0.0257, 'learning_rate': 2.0750e-06, 'epoch': 3.20, 'throughput': 1031.71} [INFO|callbacks.py:310] 2024-07-10 16:25:40,758 >> {'loss': 0.0244, 'learning_rate': 2.0833e-06, 'epoch': 3.22, 'throughput': 1031.99} [INFO|callbacks.py:310] 2024-07-10 16:25:53,802 >> {'loss': 0.0285, 'learning_rate': 2.0917e-06, 'epoch': 3.23, 'throughput': 1031.85} [INFO|callbacks.py:310] 2024-07-10 16:26:06,837 >> {'loss': 0.0093, 'learning_rate': 2.1000e-06, 'epoch': 3.24, 'throughput': 1031.89} [INFO|callbacks.py:310] 2024-07-10 16:26:19,879 >> {'loss': 0.0415, 'learning_rate': 2.1083e-06, 'epoch': 3.25, 'throughput': 1031.75} [INFO|callbacks.py:310] 2024-07-10 16:26:32,890 >> {'loss': 0.0239, 'learning_rate': 2.1167e-06, 'epoch': 3.27, 'throughput': 1031.75} [INFO|callbacks.py:310] 2024-07-10 16:26:45,971 >> {'loss': 0.0412, 'learning_rate': 2.1250e-06, 'epoch': 3.28, 'throughput': 1031.71} [INFO|callbacks.py:310] 2024-07-10 16:26:59,041 >> {'loss': 0.0503, 'learning_rate': 2.1333e-06, 'epoch': 3.29, 'throughput': 1031.89} [INFO|callbacks.py:310] 2024-07-10 16:27:12,121 >> {'loss': 0.0046, 'learning_rate': 2.1417e-06, 'epoch': 3.31, 'throughput': 1031.66} [INFO|callbacks.py:310] 2024-07-10 16:27:25,205 >> {'loss': 0.0410, 'learning_rate': 2.1500e-06, 'epoch': 3.32, 'throughput': 1031.77} [INFO|callbacks.py:310] 2024-07-10 16:27:38,269 >> {'loss': 0.0257, 'learning_rate': 2.1583e-06, 'epoch': 3.33, 'throughput': 1031.85} [INFO|callbacks.py:310] 2024-07-10 16:27:51,299 >> {'loss': 0.0168, 'learning_rate': 2.1667e-06, 'epoch': 3.34, 'throughput': 1031.84} [INFO|callbacks.py:310] 2024-07-10 16:28:04,359 >> {'loss': 0.0439, 'learning_rate': 2.1750e-06, 'epoch': 3.36, 'throughput': 1031.65} [INFO|callbacks.py:310] 2024-07-10 16:28:17,402 >> {'loss': 0.0204, 'learning_rate': 2.1833e-06, 'epoch': 3.37, 'throughput': 1031.70} [INFO|callbacks.py:310] 2024-07-10 16:28:30,454 >> {'loss': 0.0284, 'learning_rate': 2.1917e-06, 'epoch': 3.38, 'throughput': 1031.59} [INFO|callbacks.py:310] 2024-07-10 16:28:43,535 >> {'loss': 0.0684, 'learning_rate': 2.2000e-06, 'epoch': 3.40, 'throughput': 1031.39} [INFO|callbacks.py:310] 2024-07-10 16:28:56,599 >> {'loss': 0.0479, 'learning_rate': 2.2083e-06, 'epoch': 3.41, 'throughput': 1031.24} [INFO|callbacks.py:310] 2024-07-10 16:29:09,608 >> {'loss': 0.0434, 'learning_rate': 2.2167e-06, 'epoch': 3.42, 'throughput': 1031.11} [INFO|callbacks.py:310] 2024-07-10 16:29:22,660 >> {'loss': 0.0213, 'learning_rate': 2.2250e-06, 'epoch': 3.43, 'throughput': 1031.15} [INFO|callbacks.py:310] 2024-07-10 16:29:35,701 >> {'loss': 0.0415, 'learning_rate': 2.2333e-06, 'epoch': 3.45, 'throughput': 1031.22} [INFO|callbacks.py:310] 2024-07-10 16:29:48,780 >> {'loss': 0.0404, 'learning_rate': 2.2417e-06, 'epoch': 3.46, 'throughput': 1031.14} [INFO|callbacks.py:310] 2024-07-10 16:30:01,859 >> {'loss': 0.0566, 'learning_rate': 2.2500e-06, 'epoch': 3.47, 'throughput': 1031.20} [INFO|callbacks.py:310] 2024-07-10 16:30:14,945 >> {'loss': 0.0509, 'learning_rate': 2.2583e-06, 'epoch': 3.49, 'throughput': 1031.20} [INFO|callbacks.py:310] 2024-07-10 16:30:28,041 >> {'loss': 0.0385, 'learning_rate': 2.2667e-06, 'epoch': 3.50, 'throughput': 1031.23} [INFO|callbacks.py:310] 2024-07-10 16:30:41,117 >> {'loss': 0.0225, 'learning_rate': 2.2750e-06, 'epoch': 3.51, 'throughput': 1031.20} [INFO|callbacks.py:310] 2024-07-10 16:30:54,140 >> {'loss': 0.0255, 'learning_rate': 2.2833e-06, 'epoch': 3.52, 'throughput': 1031.11} [INFO|callbacks.py:310] 2024-07-10 16:31:07,183 >> {'loss': 0.0531, 'learning_rate': 2.2917e-06, 'epoch': 3.54, 'throughput': 1030.96} [INFO|callbacks.py:310] 2024-07-10 16:31:20,244 >> {'loss': 0.0095, 'learning_rate': 2.3000e-06, 'epoch': 3.55, 'throughput': 1031.07} [INFO|callbacks.py:310] 2024-07-10 16:31:33,302 >> {'loss': 0.0229, 'learning_rate': 2.3083e-06, 'epoch': 3.56, 'throughput': 1031.17} [INFO|callbacks.py:310] 2024-07-10 16:31:46,386 >> {'loss': 0.0380, 'learning_rate': 2.3167e-06, 'epoch': 3.58, 'throughput': 1031.41} [INFO|callbacks.py:310] 2024-07-10 16:31:59,473 >> {'loss': 0.0316, 'learning_rate': 2.3250e-06, 'epoch': 3.59, 'throughput': 1031.40} [INFO|callbacks.py:310] 2024-07-10 16:32:12,500 >> {'loss': 0.0861, 'learning_rate': 2.3333e-06, 'epoch': 3.60, 'throughput': 1031.09} [INFO|callbacks.py:310] 2024-07-10 16:32:25,573 >> {'loss': 0.0566, 'learning_rate': 2.3417e-06, 'epoch': 3.61, 'throughput': 1031.26} [INFO|callbacks.py:310] 2024-07-10 16:32:38,614 >> {'loss': 0.0804, 'learning_rate': 2.3500e-06, 'epoch': 3.63, 'throughput': 1031.33} [INFO|callbacks.py:310] 2024-07-10 16:32:51,675 >> {'loss': 0.0460, 'learning_rate': 2.3583e-06, 'epoch': 3.64, 'throughput': 1031.43} [INFO|callbacks.py:310] 2024-07-10 16:33:04,743 >> {'loss': 0.0693, 'learning_rate': 2.3667e-06, 'epoch': 3.65, 'throughput': 1031.38} [INFO|callbacks.py:310] 2024-07-10 16:33:17,816 >> {'loss': 0.0342, 'learning_rate': 2.3750e-06, 'epoch': 3.67, 'throughput': 1031.47} [INFO|callbacks.py:310] 2024-07-10 16:33:30,909 >> {'loss': 0.0479, 'learning_rate': 2.3833e-06, 'epoch': 3.68, 'throughput': 1031.40} [INFO|callbacks.py:310] 2024-07-10 16:33:44,006 >> {'loss': 0.0388, 'learning_rate': 2.3917e-06, 'epoch': 3.69, 'throughput': 1031.51} [INFO|callbacks.py:310] 2024-07-10 16:33:57,059 >> {'loss': 0.0274, 'learning_rate': 2.4000e-06, 'epoch': 3.70, 'throughput': 1031.62} [INFO|callbacks.py:310] 2024-07-10 16:34:10,124 >> {'loss': 0.0259, 'learning_rate': 2.4083e-06, 'epoch': 3.72, 'throughput': 1031.83} [INFO|callbacks.py:310] 2024-07-10 16:34:23,174 >> {'loss': 0.0367, 'learning_rate': 2.4167e-06, 'epoch': 3.73, 'throughput': 1031.66} [INFO|callbacks.py:310] 2024-07-10 16:34:36,226 >> {'loss': 0.0661, 'learning_rate': 2.4250e-06, 'epoch': 3.74, 'throughput': 1031.57} [INFO|callbacks.py:310] 2024-07-10 16:34:49,271 >> {'loss': 0.0466, 'learning_rate': 2.4333e-06, 'epoch': 3.76, 'throughput': 1031.53} [INFO|callbacks.py:310] 2024-07-10 16:35:02,361 >> {'loss': 0.0286, 'learning_rate': 2.4417e-06, 'epoch': 3.77, 'throughput': 1031.51} [INFO|callbacks.py:310] 2024-07-10 16:35:15,418 >> {'loss': 0.0586, 'learning_rate': 2.4500e-06, 'epoch': 3.78, 'throughput': 1031.36} [INFO|callbacks.py:310] 2024-07-10 16:35:28,427 >> {'loss': 0.0329, 'learning_rate': 2.4583e-06, 'epoch': 3.79, 'throughput': 1031.22} [INFO|callbacks.py:310] 2024-07-10 16:35:41,482 >> {'loss': 0.0582, 'learning_rate': 2.4667e-06, 'epoch': 3.81, 'throughput': 1031.30} [INFO|callbacks.py:310] 2024-07-10 16:35:54,541 >> {'loss': 0.0312, 'learning_rate': 2.4750e-06, 'epoch': 3.82, 'throughput': 1031.36} [INFO|callbacks.py:310] 2024-07-10 16:36:07,569 >> {'loss': 0.0329, 'learning_rate': 2.4833e-06, 'epoch': 3.83, 'throughput': 1031.39} [INFO|callbacks.py:310] 2024-07-10 16:36:20,626 >> {'loss': 0.0206, 'learning_rate': 2.4917e-06, 'epoch': 3.85, 'throughput': 1031.35} [INFO|callbacks.py:310] 2024-07-10 16:36:33,678 >> {'loss': 0.0426, 'learning_rate': 2.5000e-06, 'epoch': 3.86, 'throughput': 1031.30} [INFO|callbacks.py:310] 2024-07-10 16:36:46,762 >> {'loss': 0.0179, 'learning_rate': 2.5083e-06, 'epoch': 3.87, 'throughput': 1031.28} [INFO|callbacks.py:310] 2024-07-10 16:36:59,795 >> {'loss': 0.0289, 'learning_rate': 2.5167e-06, 'epoch': 3.88, 'throughput': 1031.45} [INFO|callbacks.py:310] 2024-07-10 16:37:12,850 >> {'loss': 0.0303, 'learning_rate': 2.5250e-06, 'epoch': 3.90, 'throughput': 1031.48} [INFO|callbacks.py:310] 2024-07-10 16:37:25,916 >> {'loss': 0.0460, 'learning_rate': 2.5333e-06, 'epoch': 3.91, 'throughput': 1031.62} [INFO|callbacks.py:310] 2024-07-10 16:37:38,945 >> {'loss': 0.0523, 'learning_rate': 2.5417e-06, 'epoch': 3.92, 'throughput': 1031.55} [INFO|callbacks.py:310] 2024-07-10 16:37:51,996 >> {'loss': 0.0329, 'learning_rate': 2.5500e-06, 'epoch': 3.94, 'throughput': 1031.52} [INFO|callbacks.py:310] 2024-07-10 16:38:05,038 >> {'loss': 0.0072, 'learning_rate': 2.5583e-06, 'epoch': 3.95, 'throughput': 1031.37} [INFO|callbacks.py:310] 2024-07-10 16:38:18,108 >> {'loss': 0.0415, 'learning_rate': 2.5667e-06, 'epoch': 3.96, 'throughput': 1031.46} [INFO|callbacks.py:310] 2024-07-10 16:38:31,175 >> {'loss': 0.0233, 'learning_rate': 2.5750e-06, 'epoch': 3.97, 'throughput': 1031.53} [INFO|callbacks.py:310] 2024-07-10 16:38:44,257 >> {'loss': 0.0423, 'learning_rate': 2.5833e-06, 'epoch': 3.99, 'throughput': 1031.73} [INFO|callbacks.py:310] 2024-07-10 16:38:57,319 >> {'loss': 0.0295, 'learning_rate': 2.5917e-06, 'epoch': 4.00, 'throughput': 1031.67} [INFO|callbacks.py:310] 2024-07-10 16:39:10,340 >> {'loss': 0.0327, 'learning_rate': 2.6000e-06, 'epoch': 4.01, 'throughput': 1031.56} [INFO|callbacks.py:310] 2024-07-10 16:39:23,403 >> {'loss': 0.0301, 'learning_rate': 2.6083e-06, 'epoch': 4.03, 'throughput': 1031.43} [INFO|callbacks.py:310] 2024-07-10 16:39:36,458 >> {'loss': 0.0301, 'learning_rate': 2.6167e-06, 'epoch': 4.04, 'throughput': 1031.40} [INFO|callbacks.py:310] 2024-07-10 16:39:49,518 >> {'loss': 0.0281, 'learning_rate': 2.6250e-06, 'epoch': 4.05, 'throughput': 1031.24} [INFO|callbacks.py:310] 2024-07-10 16:40:02,644 >> {'loss': 0.0136, 'learning_rate': 2.6333e-06, 'epoch': 4.06, 'throughput': 1031.56} [INFO|callbacks.py:310] 2024-07-10 16:40:15,692 >> {'loss': 0.0219, 'learning_rate': 2.6417e-06, 'epoch': 4.08, 'throughput': 1031.48} [INFO|callbacks.py:310] 2024-07-10 16:40:28,769 >> {'loss': 0.0044, 'learning_rate': 2.6500e-06, 'epoch': 4.09, 'throughput': 1031.55} [INFO|callbacks.py:310] 2024-07-10 16:40:41,845 >> {'loss': 0.0335, 'learning_rate': 2.6583e-06, 'epoch': 4.10, 'throughput': 1031.59} [INFO|callbacks.py:310] 2024-07-10 16:40:54,879 >> {'loss': 0.0053, 'learning_rate': 2.6667e-06, 'epoch': 4.12, 'throughput': 1031.55} [INFO|callbacks.py:310] 2024-07-10 16:41:07,940 >> {'loss': 0.0196, 'learning_rate': 2.6750e-06, 'epoch': 4.13, 'throughput': 1031.35} [INFO|callbacks.py:310] 2024-07-10 16:41:21,035 >> {'loss': 0.0309, 'learning_rate': 2.6833e-06, 'epoch': 4.14, 'throughput': 1031.36} [INFO|callbacks.py:310] 2024-07-10 16:41:34,100 >> {'loss': 0.0382, 'learning_rate': 2.6917e-06, 'epoch': 4.15, 'throughput': 1031.20} [INFO|callbacks.py:310] 2024-07-10 16:41:47,159 >> {'loss': 0.0460, 'learning_rate': 2.7000e-06, 'epoch': 4.17, 'throughput': 1031.28} [INFO|callbacks.py:310] 2024-07-10 16:42:00,206 >> {'loss': 0.0133, 'learning_rate': 2.7083e-06, 'epoch': 4.18, 'throughput': 1031.13} [INFO|callbacks.py:310] 2024-07-10 16:42:13,260 >> {'loss': 0.0265, 'learning_rate': 2.7167e-06, 'epoch': 4.19, 'throughput': 1031.13} [INFO|callbacks.py:310] 2024-07-10 16:42:26,301 >> {'loss': 0.0084, 'learning_rate': 2.7250e-06, 'epoch': 4.21, 'throughput': 1031.11} [INFO|callbacks.py:310] 2024-07-10 16:42:39,360 >> {'loss': 0.0382, 'learning_rate': 2.7333e-06, 'epoch': 4.22, 'throughput': 1031.02} [INFO|callbacks.py:310] 2024-07-10 16:42:52,463 >> {'loss': 0.0101, 'learning_rate': 2.7417e-06, 'epoch': 4.23, 'throughput': 1031.21} [INFO|callbacks.py:310] 2024-07-10 16:43:05,548 >> {'loss': 0.0174, 'learning_rate': 2.7500e-06, 'epoch': 4.24, 'throughput': 1031.21} [INFO|callbacks.py:310] 2024-07-10 16:43:18,594 >> {'loss': 0.0230, 'learning_rate': 2.7583e-06, 'epoch': 4.26, 'throughput': 1031.11} [INFO|callbacks.py:310] 2024-07-10 16:43:31,646 >> {'loss': 0.0162, 'learning_rate': 2.7667e-06, 'epoch': 4.27, 'throughput': 1031.23} [INFO|callbacks.py:310] 2024-07-10 16:43:44,663 >> {'loss': 0.0261, 'learning_rate': 2.7750e-06, 'epoch': 4.28, 'throughput': 1031.06} [INFO|callbacks.py:310] 2024-07-10 16:43:57,701 >> {'loss': 0.0266, 'learning_rate': 2.7833e-06, 'epoch': 4.30, 'throughput': 1031.04} [INFO|callbacks.py:310] 2024-07-10 16:44:10,756 >> {'loss': 0.0194, 'learning_rate': 2.7917e-06, 'epoch': 4.31, 'throughput': 1030.96} [INFO|callbacks.py:310] 2024-07-10 16:44:23,810 >> {'loss': 0.0058, 'learning_rate': 2.8000e-06, 'epoch': 4.32, 'throughput': 1031.08} [INFO|callbacks.py:310] 2024-07-10 16:44:36,914 >> {'loss': 0.0065, 'learning_rate': 2.8083e-06, 'epoch': 4.33, 'throughput': 1031.06} [INFO|callbacks.py:310] 2024-07-10 16:44:50,018 >> {'loss': 0.0202, 'learning_rate': 2.8167e-06, 'epoch': 4.35, 'throughput': 1031.13} [INFO|callbacks.py:310] 2024-07-10 16:45:03,057 >> {'loss': 0.0135, 'learning_rate': 2.8250e-06, 'epoch': 4.36, 'throughput': 1031.08} [INFO|callbacks.py:310] 2024-07-10 16:45:16,126 >> {'loss': 0.0100, 'learning_rate': 2.8333e-06, 'epoch': 4.37, 'throughput': 1031.12} [INFO|callbacks.py:310] 2024-07-10 16:45:29,168 >> {'loss': 0.0051, 'learning_rate': 2.8417e-06, 'epoch': 4.39, 'throughput': 1031.01} [INFO|callbacks.py:310] 2024-07-10 16:45:42,203 >> {'loss': 0.0293, 'learning_rate': 2.8500e-06, 'epoch': 4.40, 'throughput': 1031.19} [INFO|callbacks.py:310] 2024-07-10 16:45:55,247 >> {'loss': 0.0460, 'learning_rate': 2.8583e-06, 'epoch': 4.41, 'throughput': 1031.23} [INFO|callbacks.py:310] 2024-07-10 16:46:08,301 >> {'loss': 0.0024, 'learning_rate': 2.8667e-06, 'epoch': 4.42, 'throughput': 1031.26} [INFO|callbacks.py:310] 2024-07-10 16:46:21,356 >> {'loss': 0.0211, 'learning_rate': 2.8750e-06, 'epoch': 4.44, 'throughput': 1031.10} [INFO|callbacks.py:310] 2024-07-10 16:46:34,393 >> {'loss': 0.0229, 'learning_rate': 2.8833e-06, 'epoch': 4.45, 'throughput': 1031.11} [INFO|callbacks.py:310] 2024-07-10 16:46:47,487 >> {'loss': 0.0103, 'learning_rate': 2.8917e-06, 'epoch': 4.46, 'throughput': 1031.19} [INFO|callbacks.py:310] 2024-07-10 16:47:00,516 >> {'loss': 0.0262, 'learning_rate': 2.9000e-06, 'epoch': 4.48, 'throughput': 1031.12} [INFO|callbacks.py:310] 2024-07-10 16:47:13,566 >> {'loss': 0.0295, 'learning_rate': 2.9083e-06, 'epoch': 4.49, 'throughput': 1031.24} [INFO|callbacks.py:310] 2024-07-10 16:47:26,605 >> {'loss': 0.0149, 'learning_rate': 2.9167e-06, 'epoch': 4.50, 'throughput': 1031.25} [INFO|callbacks.py:310] 2024-07-10 16:47:39,682 >> {'loss': 0.0337, 'learning_rate': 2.9250e-06, 'epoch': 4.51, 'throughput': 1031.25} [INFO|callbacks.py:310] 2024-07-10 16:47:52,786 >> {'loss': 0.0318, 'learning_rate': 2.9333e-06, 'epoch': 4.53, 'throughput': 1031.32} [INFO|callbacks.py:310] 2024-07-10 16:48:05,841 >> {'loss': 0.0213, 'learning_rate': 2.9417e-06, 'epoch': 4.54, 'throughput': 1031.30} [INFO|callbacks.py:310] 2024-07-10 16:48:18,816 >> {'loss': 0.0048, 'learning_rate': 2.9500e-06, 'epoch': 4.55, 'throughput': 1031.03} [INFO|callbacks.py:310] 2024-07-10 16:48:31,878 >> {'loss': 0.0326, 'learning_rate': 2.9583e-06, 'epoch': 4.57, 'throughput': 1030.92} [INFO|callbacks.py:310] 2024-07-10 16:48:44,907 >> {'loss': 0.0130, 'learning_rate': 2.9667e-06, 'epoch': 4.58, 'throughput': 1030.99} [INFO|callbacks.py:310] 2024-07-10 16:48:57,961 >> {'loss': 0.0293, 'learning_rate': 2.9750e-06, 'epoch': 4.59, 'throughput': 1030.97} [INFO|callbacks.py:310] 2024-07-10 16:49:11,009 >> {'loss': 0.0411, 'learning_rate': 2.9833e-06, 'epoch': 4.60, 'throughput': 1030.93} [INFO|callbacks.py:310] 2024-07-10 16:49:24,110 >> {'loss': 0.0389, 'learning_rate': 2.9917e-06, 'epoch': 4.62, 'throughput': 1031.06} [INFO|callbacks.py:310] 2024-07-10 16:49:37,190 >> {'loss': 0.0395, 'learning_rate': 3.0000e-06, 'epoch': 4.63, 'throughput': 1031.22} [INFO|callbacks.py:310] 2024-07-10 16:49:50,206 >> {'loss': 0.0065, 'learning_rate': 3.0083e-06, 'epoch': 4.64, 'throughput': 1031.24} [INFO|callbacks.py:310] 2024-07-10 16:50:03,225 >> {'loss': 0.0294, 'learning_rate': 3.0167e-06, 'epoch': 4.66, 'throughput': 1031.15} [INFO|callbacks.py:310] 2024-07-10 16:50:16,282 >> {'loss': 0.0192, 'learning_rate': 3.0250e-06, 'epoch': 4.67, 'throughput': 1031.15} [INFO|callbacks.py:310] 2024-07-10 16:50:29,374 >> {'loss': 0.0179, 'learning_rate': 3.0333e-06, 'epoch': 4.68, 'throughput': 1031.31} [INFO|callbacks.py:310] 2024-07-10 16:50:42,461 >> {'loss': 0.0131, 'learning_rate': 3.0417e-06, 'epoch': 4.69, 'throughput': 1031.45} [INFO|callbacks.py:310] 2024-07-10 16:50:55,506 >> {'loss': 0.0216, 'learning_rate': 3.0500e-06, 'epoch': 4.71, 'throughput': 1031.33} [INFO|callbacks.py:310] 2024-07-10 16:51:08,603 >> {'loss': 0.0171, 'learning_rate': 3.0583e-06, 'epoch': 4.72, 'throughput': 1031.54} [INFO|callbacks.py:310] 2024-07-10 16:51:21,642 >> {'loss': 0.0129, 'learning_rate': 3.0667e-06, 'epoch': 4.73, 'throughput': 1031.53} [INFO|callbacks.py:310] 2024-07-10 16:51:34,663 >> {'loss': 0.0268, 'learning_rate': 3.0750e-06, 'epoch': 4.75, 'throughput': 1031.60} [INFO|callbacks.py:310] 2024-07-10 16:51:47,770 >> {'loss': 0.0313, 'learning_rate': 3.0833e-06, 'epoch': 4.76, 'throughput': 1031.67} [INFO|callbacks.py:310] 2024-07-10 16:52:00,829 >> {'loss': 0.0197, 'learning_rate': 3.0917e-06, 'epoch': 4.77, 'throughput': 1031.73} [INFO|callbacks.py:310] 2024-07-10 16:52:13,901 >> {'loss': 0.0051, 'learning_rate': 3.1000e-06, 'epoch': 4.78, 'throughput': 1031.89} [INFO|callbacks.py:310] 2024-07-10 16:52:27,006 >> {'loss': 0.0107, 'learning_rate': 3.1083e-06, 'epoch': 4.80, 'throughput': 1031.88} [INFO|callbacks.py:310] 2024-07-10 16:52:40,104 >> {'loss': 0.0299, 'learning_rate': 3.1167e-06, 'epoch': 4.81, 'throughput': 1031.92} [INFO|callbacks.py:310] 2024-07-10 16:52:53,184 >> {'loss': 0.0549, 'learning_rate': 3.1250e-06, 'epoch': 4.82, 'throughput': 1032.12} [INFO|callbacks.py:310] 2024-07-10 16:53:06,195 >> {'loss': 0.0163, 'learning_rate': 3.1333e-06, 'epoch': 4.84, 'throughput': 1032.05} [INFO|callbacks.py:310] 2024-07-10 16:53:19,295 >> {'loss': 0.0133, 'learning_rate': 3.1417e-06, 'epoch': 4.85, 'throughput': 1032.16} [INFO|callbacks.py:310] 2024-07-10 16:53:32,359 >> {'loss': 0.0338, 'learning_rate': 3.1500e-06, 'epoch': 4.86, 'throughput': 1032.15} [INFO|callbacks.py:310] 2024-07-10 16:53:45,415 >> {'loss': 0.0219, 'learning_rate': 3.1583e-06, 'epoch': 4.87, 'throughput': 1032.12} [INFO|callbacks.py:310] 2024-07-10 16:53:58,454 >> {'loss': 0.0113, 'learning_rate': 3.1667e-06, 'epoch': 4.89, 'throughput': 1032.06} [INFO|callbacks.py:310] 2024-07-10 16:54:11,538 >> {'loss': 0.0297, 'learning_rate': 3.1750e-06, 'epoch': 4.90, 'throughput': 1031.99} [INFO|callbacks.py:310] 2024-07-10 16:54:24,604 >> {'loss': 0.0417, 'learning_rate': 3.1833e-06, 'epoch': 4.91, 'throughput': 1031.99} [INFO|callbacks.py:310] 2024-07-10 16:54:37,636 >> {'loss': 0.0270, 'learning_rate': 3.1917e-06, 'epoch': 4.93, 'throughput': 1031.98} [INFO|callbacks.py:310] 2024-07-10 16:54:50,650 >> {'loss': 0.0271, 'learning_rate': 3.2000e-06, 'epoch': 4.94, 'throughput': 1031.83} [INFO|callbacks.py:310] 2024-07-10 16:55:03,728 >> {'loss': 0.0207, 'learning_rate': 3.2083e-06, 'epoch': 4.95, 'throughput': 1032.00} [INFO|trainer.py:3478] 2024-07-10 16:55:11,454 >> Saving model checkpoint to saves/LLaMA3-8B/full/train_2024-07-10-15-21-44_llama3/checkpoint-385 [INFO|configuration_utils.py:472] 2024-07-10 16:55:11,458 >> Configuration saved in saves/LLaMA3-8B/full/train_2024-07-10-15-21-44_llama3/checkpoint-385/config.json [INFO|configuration_utils.py:769] 2024-07-10 16:55:11,458 >> Configuration saved in saves/LLaMA3-8B/full/train_2024-07-10-15-21-44_llama3/checkpoint-385/generation_config.json [INFO|modeling_utils.py:2698] 2024-07-10 16:55:27,632 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 4 checkpoint shards. You can find where each parameters has been saved in the index located at saves/LLaMA3-8B/full/train_2024-07-10-15-21-44_llama3/checkpoint-385/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2574] 2024-07-10 16:55:27,635 >> tokenizer config file saved in saves/LLaMA3-8B/full/train_2024-07-10-15-21-44_llama3/checkpoint-385/tokenizer_config.json [INFO|tokenization_utils_base.py:2583] 2024-07-10 16:55:27,636 >> Special tokens file saved in saves/LLaMA3-8B/full/train_2024-07-10-15-21-44_llama3/checkpoint-385/special_tokens_map.json [INFO|trainer.py:2383] 2024-07-10 16:56:04,457 >> Training completed. Do not forget to share your model on huggingface.co/models =) [INFO|trainer.py:3478] 2024-07-10 16:56:12,165 >> Saving model checkpoint to saves/LLaMA3-8B/full/train_2024-07-10-15-21-44_llama3 [INFO|configuration_utils.py:472] 2024-07-10 16:56:12,167 >> Configuration saved in saves/LLaMA3-8B/full/train_2024-07-10-15-21-44_llama3/config.json [INFO|configuration_utils.py:769] 2024-07-10 16:56:12,168 >> Configuration saved in saves/LLaMA3-8B/full/train_2024-07-10-15-21-44_llama3/generation_config.json [INFO|modeling_utils.py:2698] 2024-07-10 16:56:28,747 >> The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 4 checkpoint shards. You can find where each parameters has been saved in the index located at saves/LLaMA3-8B/full/train_2024-07-10-15-21-44_llama3/model.safetensors.index.json. [INFO|tokenization_utils_base.py:2574] 2024-07-10 16:56:28,750 >> tokenizer config file saved in saves/LLaMA3-8B/full/train_2024-07-10-15-21-44_llama3/tokenizer_config.json [INFO|tokenization_utils_base.py:2583] 2024-07-10 16:56:28,750 >> Special tokens file saved in saves/LLaMA3-8B/full/train_2024-07-10-15-21-44_llama3/special_tokens_map.json [WARNING|ploting.py:89] 2024-07-10 16:56:30,068 >> No metric eval_loss to plot. [WARNING|ploting.py:89] 2024-07-10 16:56:30,068 >> No metric eval_accuracy to plot. [INFO|modelcard.py:449] 2024-07-10 16:56:30,069 >> Dropping the following result as it does not have all the necessary fields: {'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}