diff --git "a/res_ultrachat_13B.txt" "b/res_ultrachat_13B.txt" new file mode 100644--- /dev/null +++ "b/res_ultrachat_13B.txt" @@ -0,0 +1,1894 @@ + +/leonardo/prod/spack/03/ccsdeploy/hosts/cineca.it/BA/setup-var.sh: riga 61: $(tty): redirezione ambigua +Node IP: 10.6.1.54 +master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified. +WARNING:torch.distributed.run: +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs: + entrypoint : finetune_chat_llama_13B.py + min_nodes : 3 + max_nodes : 3 + nproc_per_node : 4 + run_id : 29353 + rdzv_backend : c10d + rdzv_endpoint : 10.6.1.54:29500 + rdzv_configs : {'timeout': 900} + max_restarts : 0 + monitor_interval : 5 + log_dir : None + metrics_cfg : {} + +INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_szpr_trh/29353_fzy75qgg +INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python3 +INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group +master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified. +WARNING:torch.distributed.run: +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs: + entrypoint : finetune_chat_llama_13B.py + min_nodes : 3 + max_nodes : 3 + nproc_per_node : 4 + run_id : 29353 + rdzv_backend : c10d + rdzv_endpoint : 10.6.1.54:29500 + rdzv_configs : {'timeout': 900} + max_restarts : 0 + monitor_interval : 5 + log_dir : None + metrics_cfg : {} + +INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_9wl7vyc5/29353_wgoojszx +INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python3 +INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group +master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified. +WARNING:torch.distributed.run: +***************************************** +Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. +***************************************** +INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs: + entrypoint : finetune_chat_llama_13B.py + min_nodes : 3 + max_nodes : 3 + nproc_per_node : 4 + run_id : 29353 + rdzv_backend : c10d + rdzv_endpoint : 10.6.1.54:29500 + rdzv_configs : {'timeout': 900} + max_restarts : 0 + monitor_interval : 5 + log_dir : None + metrics_cfg : {} + +INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_wpk04vb4/29353_qulo0whx +INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python3 +INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group +INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: + restart_count=0 + master_addr=lrdn2110-net6-3.leonardo.local + master_port=37919 + group_rank=0 + group_world_size=3 + local_ranks=[0, 1, 2, 3] + role_ranks=[0, 1, 2, 3] + global_ranks=[0, 1, 2, 3] + role_world_sizes=[12, 12, 12, 12] + global_world_sizes=[12, 12, 12, 12] + +INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group +INFO:torch.distributed.elastic.agent.server.local_elastic_agent:Starting a FileTimerServer with /tmp/watchdog_timer_f92edb7d-4d3d-442d-aad6-058dd23fc6be ... +INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: + restart_count=0 + master_addr=lrdn2110-net6-3.leonardo.local + master_port=37919 + group_rank=1 + group_world_size=3 + local_ranks=[0, 1, 2, 3] + role_ranks=[4, 5, 6, 7] + global_ranks=[4, 5, 6, 7] + role_world_sizes=[12, 12, 12, 12] + global_world_sizes=[12, 12, 12, 12] + +INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group +INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: + restart_count=0 + master_addr=lrdn2110-net6-3.leonardo.local + master_port=37919 + group_rank=2 + group_world_size=3 + local_ranks=[0, 1, 2, 3] + role_ranks=[8, 9, 10, 11] + global_ranks=[8, 9, 10, 11] + role_world_sizes=[12, 12, 12, 12] + global_world_sizes=[12, 12, 12, 12] + +INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group +INFO:torch.distributed.elastic.agent.server.local_elastic_agent:Starting a FileTimerServer with /tmp/watchdog_timer_63960e88-d48a-42f3-ba13-7de9376a3825 ... +INFO:torch.distributed.elastic.agent.server.local_elastic_agent:FileTimerServer started +INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_szpr_trh/29353_fzy75qgg/attempt_0/0/error.json +INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_szpr_trh/29353_fzy75qgg/attempt_0/1/error.json +INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_szpr_trh/29353_fzy75qgg/attempt_0/2/error.json +INFO:torch.distributed.elastic.multiprocessing:Setting worker3 reply file to: /tmp/torchelastic_szpr_trh/29353_fzy75qgg/attempt_0/3/error.json +INFO:torch.distributed.elastic.agent.server.local_elastic_agent:Starting a FileTimerServer with /tmp/watchdog_timer_b96ab573-7472-4b9b-bb4a-dd5e32bb7a2d ... +INFO:torch.distributed.elastic.agent.server.local_elastic_agent:FileTimerServer started +INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_9wl7vyc5/29353_wgoojszx/attempt_0/0/error.json +INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_9wl7vyc5/29353_wgoojszx/attempt_0/1/error.json +INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_9wl7vyc5/29353_wgoojszx/attempt_0/2/error.json +INFO:torch.distributed.elastic.multiprocessing:Setting worker3 reply file to: /tmp/torchelastic_9wl7vyc5/29353_wgoojszx/attempt_0/3/error.json +INFO:torch.distributed.elastic.agent.server.local_elastic_agent:FileTimerServer started +INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_wpk04vb4/29353_qulo0whx/attempt_0/0/error.json +INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_wpk04vb4/29353_qulo0whx/attempt_0/1/error.json +INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_wpk04vb4/29353_qulo0whx/attempt_0/2/error.json +INFO:torch.distributed.elastic.multiprocessing:Setting worker3 reply file to: /tmp/torchelastic_wpk04vb4/29353_qulo0whx/attempt_0/3/error.json +INFO +WARN +Clearing GPU cache for all ranks +--> Running with torch dist debug set to detail +Using device: cuda + +################{'': 'cuda:0'}################ +INFO +WARN +Using device: Using device: Using device: Clearing GPU cache for all ranks +Using device: cuda + +################{'': 'cuda:1'}################ +cuda + +################{'': 'cuda:2'}################ +cuda + +################{'': 'cuda:3'}################ +cuda + +################{'': 'cuda:0'}################ +INFO +WARN +INFO +WARN +INFO +WARN +INFO +WARN +Clearing GPU cache for all ranks +Using device: cuda + +################{'': 'cuda:0'}################ +Using device: cuda + +################{'': 'cuda:1'}################ +INFO +WARN +Using device: cuda + +################{'': 'cuda:2'}################ +Using device: cuda + +################{'': 'cuda:3'}################ +Using device: cuda + +################{'': 'cuda:1'}################ +Using device: cuda + +################{'': 'cuda:2'}################ +Using device: cuda + +################{'': 'cuda:3'}################ +INFO +WARN +INFO +WARN +INFO +WARN +INFO +WARN +INFO +WARN +loading configuration file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/config.json +Model config LlamaConfig { + "_name_or_path": "/leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA", + "architectures": [ + "LlamaForCausalLM" + ], + "bos_token_id": 1, + "eos_token_id": 2, + "hidden_act": "silu", + "hidden_size": 5120, + "initializer_range": 0.02, + "intermediate_size": 13824, + "max_position_embeddings": 4096, + "model_type": "llama", + "num_attention_heads": 40, + "num_hidden_layers": 40, + "num_key_value_heads": 40, + "pad_token_id": 0, + "pretraining_tp": 1, + "rms_norm_eps": 1e-05, + "rope_scaling": null, + "tie_word_embeddings": false, + "torch_dtype": "float16", + "transformers_version": "4.31.0", + "use_cache": true, + "vocab_size": 32000 +} + +loading configuration file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/config.json +Model config LlamaConfig { +loading configuration file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/config.json +Model config LlamaConfig { + "_name_or_path": "/leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA", + "architectures": [ + "LlamaForCausalLM" + ], + "bos_token_id": 1, + "eos_token_id": 2, + "hidden_act": "silu", + "hidden_size": 5120, + "initializer_range": 0.02, + "intermediate_size": 13824, + "max_position_embeddings": 4096, + "model_type": "llama", + "num_attention_heads": 40, + "num_hidden_layers": 40, + "num_key_value_heads": 40, + "pad_token_id": 0, + "pretraining_tp": 1, + "rms_norm_eps": 1e-05, + "rope_scaling": null, + "tie_word_embeddings": false, + "torch_dtype": "float16", + "transformers_version": "4.31.0", + "use_cache": true, + "vocab_size": 32000 +} + +loading configuration file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/config.json +Model config LlamaConfig { +loading configuration file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/config.json +Model config LlamaConfig { + "_name_or_path": "/leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA", + "architectures": [ + "LlamaForCausalLM" + ], + "bos_token_id": 1, + "eos_token_id": 2, + "hidden_act": "silu", + "hidden_size": 5120, + "initializer_range": 0.02, + "intermediate_size": 13824, + "max_position_embeddings": 4096, + "model_type": "llama", + "num_attention_heads": 40, + "num_hidden_layers": 40, + "num_key_value_heads": 40, + "pad_token_id": 0, + "pretraining_tp": 1, + "rms_norm_eps": 1e-05, + "rope_scaling": null, + "tie_word_embeddings": false, + "torch_dtype": "float16", + "transformers_version": "4.31.0", + "use_cache": true, + "vocab_size": 32000 +} + +loading configuration file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/config.json +Model config LlamaConfig { + "_name_or_path": "/leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA", + "architectures": [ + "LlamaForCausalLM" + ], + "bos_token_id": 1, + "eos_token_id": 2, + "hidden_act": "silu", + "hidden_size": 5120, + "initializer_range": 0.02, + "intermediate_size": 13824, + "max_position_embeddings": 4096, + "model_type": "llama", + "num_attention_heads": 40, + "num_hidden_layers": 40, + "num_key_value_heads": 40, + "pad_token_id": 0, + "pretraining_tp": 1, + "rms_norm_eps": 1e-05, + "rope_scaling": null, + "tie_word_embeddings": false, + "torch_dtype": "float16", + "transformers_version": "4.31.0", + "use_cache": true, + "vocab_size": 32000 +} + + "_name_or_path": "/leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA", + "architectures": [ + "LlamaForCausalLM" + ], + "bos_token_id": 1, + "eos_token_id": 2, + "hidden_act": "silu", + "hidden_size": 5120, + "initializer_range": 0.02, + "intermediate_size": 13824, + "max_position_embeddings": 4096, + "model_type": "llama", + "num_attention_heads": 40, + "num_hidden_layers": 40, + "num_key_value_heads": 40, + "pad_token_id": 0, + "pretraining_tp": 1, + "rms_norm_eps": 1e-05, + "rope_scaling": null, + "tie_word_embeddings": false, + "torch_dtype": "float16", + "transformers_version": "4.31.0", + "use_cache": true, + "vocab_size": 32000 +} + +loading configuration file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/config.json +Model config LlamaConfig { + "_name_or_path": "/leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA", + "architectures": [ + "LlamaForCausalLM" + ], + "_name_or_path": "/leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA", + "architectures": [ + "LlamaForCausalLM" + ], + "bos_token_id": 1, + "eos_token_id": 2, + "hidden_act": "silu", + "hidden_size": 5120, + "initializer_range": 0.02, + "intermediate_size": 13824, + "max_position_embeddings": 4096, + "model_type": "llama", + "num_attention_heads": 40, + "num_hidden_layers": 40, + "num_key_value_heads": 40, + "pad_token_id": 0, + "pretraining_tp": 1, + "rms_norm_eps": 1e-05, + "rope_scaling": null, + "tie_word_embeddings": false, + "torch_dtype": "float16", + "transformers_version": "4.31.0", + "use_cache": true, + "vocab_size": 32000 +} + +loading configuration file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/config.json +Model config LlamaConfig { + "_name_or_path": "/leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA", + "architectures": [ + "LlamaForCausalLM" + ], +loading configuration file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/config.json +Model config LlamaConfig { + "_name_or_path": "/leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA", + "architectures": [ + "LlamaForCausalLM" + ], + "bos_token_id": 1, + "eos_token_id": 2, + "hidden_act": "silu", + "hidden_size": 5120, + "initializer_range": 0.02, + "intermediate_size": 13824, + "max_position_embeddings": 4096, + "model_type": "llama", + "num_attention_heads": 40, + "num_hidden_layers": 40, + "num_key_value_heads": 40, + "pad_token_id": 0, + "pretraining_tp": 1, + "rms_norm_eps": 1e-05, + "rope_scaling": null, + "tie_word_embeddings": false, + "torch_dtype": "float16", + "transformers_version": "4.31.0", + "use_cache": true, + "vocab_size": 32000 +} + + "bos_token_id": 1, + "eos_token_id": 2, + "hidden_act": "silu", + "hidden_size": 5120, + "initializer_range": 0.02, + "intermediate_size": 13824, + "max_position_embeddings": 4096, + "model_type": "llama", + "num_attention_heads": 40, + "num_hidden_layers": 40, + "num_key_value_heads": 40, + "pad_token_id": 0, + "pretraining_tp": 1, + "rms_norm_eps": 1e-05, + "rope_scaling": null, + "tie_word_embeddings": false, + "torch_dtype": "float16", + "transformers_version": "4.31.0", + "use_cache": true, + "vocab_size": 32000 +} + + "bos_token_id": 1, + "eos_token_id": 2, + "hidden_act": "silu", + "hidden_size": 5120, + "initializer_range": 0.02, + "intermediate_size": 13824, + "max_position_embeddings": 4096, + "model_type": "llama", + "num_attention_heads": 40, + "num_hidden_layers": 40, + "num_key_value_heads": 40, + "pad_token_id": 0, + "pretraining_tp": 1, + "rms_norm_eps": 1e-05, + "rope_scaling": null, + "tie_word_embeddings": false, + "torch_dtype": "float16", + "transformers_version": "4.31.0", + "use_cache": true, + "vocab_size": 32000 +} + +loading configuration file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/config.json +Model config LlamaConfig { + "_name_or_path": "/leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA", + "architectures": [ + "LlamaForCausalLM" + ], + "bos_token_id": 1, + "eos_token_id": 2, + "hidden_act": "silu", + "hidden_size": 5120, + "initializer_range": 0.02, + "intermediate_size": 13824, + "max_position_embeddings": 4096, + "model_type": "llama", + "num_attention_heads": 40, + "num_hidden_layers": 40, + "num_key_value_heads": 40, + "pad_token_id": 0, + "pretraining_tp": 1, + "rms_norm_eps": 1e-05, + "rope_scaling": null, + "tie_word_embeddings": false, + "torch_dtype": "float16", + "transformers_version": "4.31.0", + "use_cache": true, + "vocab_size": 32000 +} + +loading configuration file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/config.json +Model config LlamaConfig { + "_name_or_path": "/leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA", + "architectures": [ + "LlamaForCausalLM" + ], + "bos_token_id": 1, + "eos_token_id": 2, + "hidden_act": "silu", + "hidden_size": 5120, + "initializer_range": 0.02, + "intermediate_size": 13824, + "max_position_embeddings": 4096, + "model_type": "llama", + "num_attention_heads": 40, + "num_hidden_layers": 40, + "num_key_value_heads": 40, + "pad_token_id": 0, + "pretraining_tp": 1, + "rms_norm_eps": 1e-05, + "rope_scaling": null, + "tie_word_embeddings": false, + "torch_dtype": "float16", + "transformers_version": "4.31.0", + "use_cache": true, + "vocab_size": 32000 +} + +loading configuration file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/config.json +Model config LlamaConfig { + "_name_or_path": "/leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA", + "architectures": [ + "LlamaForCausalLM" + ], + "bos_token_id": 1, + "eos_token_id": 2, + "hidden_act": "silu", + "hidden_size": 5120, + "initializer_range": 0.02, + "intermediate_size": 13824, + "max_position_embeddings": 4096, + "model_type": "llama", + "num_attention_heads": 40, + "num_hidden_layers": 40, + "num_key_value_heads": 40, + "pad_token_id": 0, + "pretraining_tp": 1, + "rms_norm_eps": 1e-05, + "rope_scaling": null, + "tie_word_embeddings": false, + "torch_dtype": "float16", + "transformers_version": "4.31.0", + "use_cache": true, + "vocab_size": 32000 +} + +Detected PIL version 10.1.0 +Detected PIL version 10.1.0 +Detected PIL version 10.1.0 +Detected PIL version 10.1.0 +Detected PIL version 10.1.0 +Detected PIL version 10.1.0 +Detected PIL version 10.1.0 +Detected PIL version 10.1.0 +Detected PIL version 10.1.0 +Detected PIL version 10.1.0 +Detected PIL version 10.1.0 +Detected PIL version 10.1.0 +Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in 8-bit or 4-bit. Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass torch_dtype=torch.float16 to remove this warning. +Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in 8-bit or 4-bit. Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass torch_dtype=torch.float16 to remove this warning. +Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in 8-bit or 4-bit. Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass torch_dtype=torch.float16 to remove this warning. +Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in 8-bit or 4-bit. Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass torch_dtype=torch.float16 to remove this warning. +Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in 8-bit or 4-bit. Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass torch_dtype=torch.float16 to remove this warning. +Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in 8-bit or 4-bit. Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass torch_dtype=torch.float16 to remove this warning. +Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in 8-bit or 4-bit. Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass torch_dtype=torch.float16 to remove this warning. +Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in 8-bit or 4-bit. Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass torch_dtype=torch.float16 to remove this warning. +loading weights file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/model.safetensors.index.json +loading weights file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/model.safetensors.index.json +loading weights file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/model.safetensors.index.json +loading weights file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/model.safetensors.index.json +Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in 8-bit or 4-bit. Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass torch_dtype=torch.float16 to remove this warning. +Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in 8-bit or 4-bit. Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass torch_dtype=torch.float16 to remove this warning. +Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in 8-bit or 4-bit. Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass torch_dtype=torch.float16 to remove this warning. +Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in 8-bit or 4-bit. Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass torch_dtype=torch.float16 to remove this warning. +loading weights file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/model.safetensors.index.json +loading weights file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/model.safetensors.index.json +loading weights file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/model.safetensors.index.json +loading weights file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/model.safetensors.index.json +loading weights file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/model.safetensors.index.json +loading weights file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/model.safetensors.index.json +loading weights file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/model.safetensors.index.json +loading weights file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/model.safetensors.index.json +Instantiating LlamaForCausalLM model under default dtype torch.float16. +Instantiating LlamaForCausalLM model under default dtype torch.float16. +Instantiating LlamaForCausalLM model under default dtype torch.float16. +Generate config GenerationConfig { + "_from_model_config": true, + "bos_token_id": 1, + "eos_token_id": 2, + "pad_token_id": 0, + "transformers_version": "4.31.0" +} + +Instantiating LlamaForCausalLM model under default dtype torch.float16. +Generate config GenerationConfig { + "_from_model_config": true, + "bos_token_id": 1, + "eos_token_id": 2, + "pad_token_id": 0, + "transformers_version": "4.31.0" +} + +Instantiating LlamaForCausalLM model under default dtype torch.float16. +Instantiating LlamaForCausalLM model under default dtype torch.float16. +Instantiating LlamaForCausalLM model under default dtype torch.float16. +Instantiating LlamaForCausalLM model under default dtype torch.float16. +Generate config GenerationConfig { + "_from_model_config": true, + "bos_token_id": 1, + "eos_token_id": 2, + "pad_token_id": 0, + "transformers_version": "4.31.0" +} + +Generate config GenerationConfig { + "_from_model_config": true, + "bos_token_id": 1, + "eos_token_id": 2, + "pad_token_id": 0, + "transformers_version": "4.31.0" +} + +Generate config GenerationConfig { + "_from_model_config": true, + "bos_token_id": 1, + "eos_token_id": 2, + "pad_token_id": 0, + "transformers_version": "4.31.0" +} + +Generate config GenerationConfig { + "_from_model_config": true, + "bos_token_id": 1, + "eos_token_id": 2, + "pad_token_id": 0, + "transformers_version": "4.31.0" +} + +Instantiating LlamaForCausalLM model under default dtype torch.float16. +Instantiating LlamaForCausalLM model under default dtype torch.float16. +Instantiating LlamaForCausalLM model under default dtype torch.float16. +Instantiating LlamaForCausalLM model under default dtype torch.float16. +Generate config GenerationConfig { + "_from_model_config": true, + "bos_token_id": 1, + "eos_token_id": 2, + "pad_token_id": 0, + "transformers_version": "4.31.0" +} + +Generate config GenerationConfig { + "_from_model_config": true, + "bos_token_id": 1, + "eos_token_id": 2, + "pad_token_id": 0, + "transformers_version": "4.31.0" +} + +Generate config GenerationConfig { + "_from_model_config": true, + "bos_token_id": 1, + "eos_token_id": 2, + "pad_token_id": 0, + "transformers_version": "4.31.0" +} + +Generate config GenerationConfig { + "_from_model_config": true, + "bos_token_id": 1, + "eos_token_id": 2, + "pad_token_id": 0, + "transformers_version": "4.31.0" +} + +Generate config GenerationConfig { + "_from_model_config": true, + "bos_token_id": 1, + "eos_token_id": 2, + "pad_token_id": 0, + "transformers_version": "4.31.0" +} + +Generate config GenerationConfig { + "_from_model_config": true, + "bos_token_id": 1, + "eos_token_id": 2, + "pad_token_id": 0, + "transformers_version": "4.31.0" +} + +Detected 8-bit loading: activating 8-bit loading for this model +Detected 8-bit loading: activating 8-bit loading for this model +Detected 8-bit loading: activating 8-bit loading for this model +Detected 8-bit loading: activating 8-bit loading for this model +Detected 8-bit loading: activating 8-bit loading for this model +Detected 8-bit loading: activating 8-bit loading for this model +Detected 8-bit loading: activating 8-bit loading for this model +Detected 8-bit loading: activating 8-bit loading for this model +Detected 8-bit loading: activating 8-bit loading for this model +Detected 8-bit loading: activating 8-bit loading for this model +Detected 8-bit loading: activating 8-bit loading for this model +Detected 8-bit loading: activating 8-bit loading for this model + Loading checkpoint shards: 0%| | 0/3 [00:00 to the pad_token key of the tokenizer +Assigning to the pad_token key of the tokenizer +Assigning to the pad_token key of the tokenizer +Assigning to the pad_token key of the tokenizer + Loading checkpoint shards: 0%| | 0/3 [00:00 to the pad_token key of the tokenizer +Assigning to the pad_token key of the tokenizer +Assigning to the pad_token key of the tokenizer +Assigning to the pad_token key of the tokenizer +Dataset({ + features: ['id', 'data'], + num_rows: 512837 +}) +Dataset({ + features: ['id', 'data'], + num_rows: 512837 +}) +Dataset({ + features: ['id', 'data'], + num_rows: 512837 +}) +Dataset({ + features: ['text'], + num_rows: 512837 +}) +NVIDIA PG506-242 cuda:3 Allocated: 7.4 GB +Dataset({ + features: ['text'], + num_rows: 512837 +}) +NVIDIA PG506-243 cuda:1 Allocated: 7.4 GB +PyTorch: setting up devices +PyTorch: setting up devices +/leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/peft/utils/other.py:102: FutureWarning: prepare_model_for_int8_training is deprecated and will be removed in a future version. Use prepare_model_for_kbit_training instead. + warnings.warn( +/leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/peft/utils/other.py:102: FutureWarning: prepare_model_for_int8_training is deprecated and will be removed in a future version. Use prepare_model_for_kbit_training instead. + warnings.warn( +Dataset({ + features: ['text'], + num_rows: 512837 +}) +NVIDIA PG506-242 cuda:2 Allocated: 7.4 GB +PyTorch: setting up devices +/leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/peft/utils/other.py:102: FutureWarning: prepare_model_for_int8_training is deprecated and will be removed in a future version. Use prepare_model_for_kbit_training instead. + warnings.warn( +Dataset({ + features: ['id', 'data'], + num_rows: 512837 +}) +Dataset({ + features: ['text'], + num_rows: 512837 +}) +NVIDIA PG506-243 cuda:0 Allocated: 7.4 GB +PyTorch: setting up devices +/leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/peft/utils/other.py:102: FutureWarning: prepare_model_for_int8_training is deprecated and will be removed in a future version. Use prepare_model_for_kbit_training instead. + warnings.warn( +Dataset({ + features: ['id', 'data'], + num_rows: 512837 +}) +Dataset({ + features: ['text'], + num_rows: 512837 +}) +NVIDIA PG506-242 cuda:3 Allocated: 7.4 GB +PyTorch: setting up devices +/leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/peft/utils/other.py:102: FutureWarning: prepare_model_for_int8_training is deprecated and will be removed in a future version. Use prepare_model_for_kbit_training instead. + warnings.warn( +Dataset({ + features: ['id', 'data'], + num_rows: 512837 +}) +Dataset({ + features: ['text'], + num_rows: 512837 +}) +NVIDIA PG506-243 cuda:0 Allocated: 7.4 GB +PyTorch: setting up devices +/leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/peft/utils/other.py:102: FutureWarning: prepare_model_for_int8_training is deprecated and will be removed in a future version. Use prepare_model_for_kbit_training instead. + warnings.warn( +Dataset({ + features: ['id', 'data'], + num_rows: 512837 +}) +Dataset({ + features: ['text'], + num_rows: 512837 +}) +NVIDIA PG506-243 cuda:1 Allocated: 7.4 GB +PyTorch: setting up devices +/leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/peft/utils/other.py:102: FutureWarning: prepare_model_for_int8_training is deprecated and will be removed in a future version. Use prepare_model_for_kbit_training instead. + warnings.warn( +Dataset({ + features: ['id', 'data'], + num_rows: 512837 +}) +Dataset({ + features: ['text'], + num_rows: 512837 +}) +NVIDIA PG506-242 cuda:2 Allocated: 7.4 GB +PyTorch: setting up devices +/leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/peft/utils/other.py:102: FutureWarning: prepare_model_for_int8_training is deprecated and will be removed in a future version. Use prepare_model_for_kbit_training instead. + warnings.warn( +NVIDIA PG506-242 cuda:3 Allocated: 8.2 GB +/leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:227: UserWarning: You passed `packing=True` to the SFTTrainer, and you are training your model with `max_steps` strategy. The dataset will be iterated until the `max_steps` are reached. + warnings.warn( +NVIDIA PG506-242 cuda:3 Allocated: 8.2 GB +/leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:227: UserWarning: You passed `packing=True` to the SFTTrainer, and you are training your model with `max_steps` strategy. The dataset will be iterated until the `max_steps` are reached. + warnings.warn( +NVIDIA PG506-243 cuda:1 Allocated: 8.2 GB +/leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:227: UserWarning: You passed `packing=True` to the SFTTrainer, and you are training your model with `max_steps` strategy. The dataset will be iterated until the `max_steps` are reached. + warnings.warn( +NVIDIA PG506-242 cuda:2 Allocated: 8.2 GB +/leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:227: UserWarning: You passed `packing=True` to the SFTTrainer, and you are training your model with `max_steps` strategy. The dataset will be iterated until the `max_steps` are reached. + warnings.warn( +You have loaded a model on multiple GPUs. `is_model_parallel` attribute will be force-set to `True` to avoid any unexpected behavior such as device placement mismatching. +The model is loaded in 8-bit precision. To train this model you need to add additional modules inside the model such as adapters using `peft` library and freeze the model weights. Please check the examples in https://github.com/huggingface/peft for more details. +max_steps is given, it will override any value given in num_train_epochs +NVIDIA PG506-243 cuda:0 Allocated: 8.2 GB +/leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:227: UserWarning: You passed `packing=True` to the SFTTrainer, and you are training your model with `max_steps` strategy. The dataset will be iterated until the `max_steps` are reached. + warnings.warn( +Currently training with a batch size of: 8 +NCCL version 2.14.3+cuda11.7 +You have loaded a model on multiple GPUs. `is_model_parallel` attribute will be force-set to `True` to avoid any unexpected behavior such as device placement mismatching. +The model is loaded in 8-bit precision. To train this model you need to add additional modules inside the model such as adapters using `peft` library and freeze the model weights. Please check the examples in https://github.com/huggingface/peft for more details. +max_steps is given, it will override any value given in num_train_epochs +NVIDIA PG506-243 cuda:0 Allocated: 8.2 GB +/leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:227: UserWarning: You passed `packing=True` to the SFTTrainer, and you are training your model with `max_steps` strategy. The dataset will be iterated until the `max_steps` are reached. + warnings.warn( +Currently training with a batch size of: 8 +NVIDIA PG506-243 cuda:1 Allocated: 8.2 GB +/leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:227: UserWarning: You passed `packing=True` to the SFTTrainer, and you are training your model with `max_steps` strategy. The dataset will be iterated until the `max_steps` are reached. + warnings.warn( +NVIDIA PG506-242 cuda:2 Allocated: 8.2 GB +/leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:227: UserWarning: You passed `packing=True` to the SFTTrainer, and you are training your model with `max_steps` strategy. The dataset will be iterated until the `max_steps` are reached. + warnings.warn( + Loading checkpoint shards: 0%| | 0/3 [00:00 to the pad_token key of the tokenizer +Assigning to the pad_token key of the tokenizer +Assigning to the pad_token key of the tokenizer +Assigning to the pad_token key of the tokenizer +Dataset({ + features: ['id', 'data'], + num_rows: 512837 +}) +Dataset({ + features: ['id', 'data'], + num_rows: 512837 +}) +Dataset({ + features: ['text'], + num_rows: 512837 +}) +NVIDIA PG506-242 cuda:2 Allocated: 7.4 GB +PyTorch: setting up devices +/leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/peft/utils/other.py:102: FutureWarning: prepare_model_for_int8_training is deprecated and will be removed in a future version. Use prepare_model_for_kbit_training instead. + warnings.warn( +Dataset({ + features: ['text'], + num_rows: 512837 +}) +NVIDIA PG506-243 cuda:0 Allocated: 7.4 GB +PyTorch: setting up devices +/leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/peft/utils/other.py:102: FutureWarning: prepare_model_for_int8_training is deprecated and will be removed in a future version. Use prepare_model_for_kbit_training instead. + warnings.warn( +Dataset({ + features: ['id', 'data'], + num_rows: 512837 +}) +Dataset({ + features: ['text'], + num_rows: 512837 +}) +NVIDIA PG506-243 cuda:1 Allocated: 7.4 GB +PyTorch: setting up devices +/leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/peft/utils/other.py:102: FutureWarning: prepare_model_for_int8_training is deprecated and will be removed in a future version. Use prepare_model_for_kbit_training instead. + warnings.warn( +Dataset({ + features: ['id', 'data'], + num_rows: 512837 +}) +Dataset({ + features: ['text'], + num_rows: 512837 +}) +NVIDIA PG506-242 cuda:3 Allocated: 7.4 GB +PyTorch: setting up devices +/leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/peft/utils/other.py:102: FutureWarning: prepare_model_for_int8_training is deprecated and will be removed in a future version. Use prepare_model_for_kbit_training instead. + warnings.warn( +You have loaded a model on multiple GPUs. `is_model_parallel` attribute will be force-set to `True` to avoid any unexpected behavior such as device placement mismatching. +The model is loaded in 8-bit precision. To train this model you need to add additional modules inside the model such as adapters using `peft` library and freeze the model weights. Please check the examples in https://github.com/huggingface/peft for more details. +max_steps is given, it will override any value given in num_train_epochs +NVIDIA PG506-243 cuda:0 Allocated: 8.2 GB +/leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:227: UserWarning: You passed `packing=True` to the SFTTrainer, and you are training your model with `max_steps` strategy. The dataset will be iterated until the `max_steps` are reached. + warnings.warn( +NVIDIA PG506-242 cuda:2 Allocated: 8.2 GB +/leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:227: UserWarning: You passed `packing=True` to the SFTTrainer, and you are training your model with `max_steps` strategy. The dataset will be iterated until the `max_steps` are reached. + warnings.warn( +Currently training with a batch size of: 8 +NVIDIA PG506-243 cuda:1 Allocated: 8.2 GB +/leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:227: UserWarning: You passed `packing=True` to the SFTTrainer, and you are training your model with `max_steps` strategy. The dataset will be iterated until the `max_steps` are reached. + warnings.warn( +NVIDIA PG506-242 cuda:3 Allocated: 8.2 GB +/leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:227: UserWarning: You passed `packing=True` to the SFTTrainer, and you are training your model with `max_steps` strategy. The dataset will be iterated until the `max_steps` are reached. + warnings.warn( +***** Running training ***** + Num examples = 512,837 + Num Epochs = 3 + Instantaneous batch size per device = 8 + Total train batch size (w. parallel, distributed & accumulation) = 96 + Gradient Accumulation steps = 1 + Total optimization steps = 15,000 + Number of trainable parameters = 52,428,800 +***** Running training ***** + Num examples = 512,837 + Num Epochs = 3 + Instantaneous batch size per device = 8 + Total train batch size (w. parallel, distributed & accumulation) = 96 + Gradient Accumulation steps = 1 + Total optimization steps = 15,000 + Number of trainable parameters = 52,428,800 +***** Running training ***** + Num examples = 512,837 + Num Epochs = 3 + Instantaneous batch size per device = 8 + Total train batch size (w. parallel, distributed & accumulation) = 96 + Gradient Accumulation steps = 1 + Total optimization steps = 15,000 + Number of trainable parameters = 52,428,800 + 0%| | 0/15000 [00:00 to the pad_token key of the tokenizer +Configuration saved in Llamantino-2-13b-chat-hf-ITA_UltraB_final/config.json +Configuration saved in Llamantino-2-13b-chat-hf-ITA_UltraB_final/generation_config.json +loading file tokenizer.model +loading file tokenizer.json +loading file added_tokens.json +loading file special_tokens_map.json +loading file tokenizer_config.json +Assigning to the pad_token key of the tokenizer +Configuration saved in Llamantino-2-13b-chat-hf-ITA_UltraB_final/config.json +Configuration saved in Llamantino-2-13b-chat-hf-ITA_UltraB_final/generation_config.json +loading file tokenizer.model +loading file tokenizer.json +loading file added_tokens.json +loading file special_tokens_map.json +loading file tokenizer_config.json +Assigning to the pad_token key of the tokenizer +Configuration saved in Llamantino-2-13b-chat-hf-ITA_UltraB_final/config.json +Configuration saved in Llamantino-2-13b-chat-hf-ITA_UltraB_final/generation_config.json +The model is bigger than the maximum size per checkpoint (10GB) and is going to be split in 3 checkpoint shards. You can find where each parameters has been saved in the index located at Llamantino-2-13b-chat-hf-ITA_UltraB_final/model.safetensors.index.json. +Configuration saved in Llamantino-2-13b-chat-hf-ITA_UltraB_final/config.json +Configuration saved in Llamantino-2-13b-chat-hf-ITA_UltraB_final/generation_config.json +The model is bigger than the maximum size per checkpoint (10GB) and is going to be split in 3 checkpoint shards. You can find where each parameters has been saved in the index located at Llamantino-2-13b-chat-hf-ITA_UltraB_final/model.safetensors.index.json. +Configuration saved in Llamantino-2-13b-chat-hf-ITA_UltraB_final/config.json +Configuration saved in Llamantino-2-13b-chat-hf-ITA_UltraB_final/generation_config.json +The model is bigger than the maximum size per checkpoint (10GB) and is going to be split in 3 checkpoint shards. You can find where each parameters has been saved in the index located at Llamantino-2-13b-chat-hf-ITA_UltraB_final/model.safetensors.index.json. +Configuration saved in Llamantino-2-13b-chat-hf-ITA_UltraB_final/config.json +Configuration saved in Llamantino-2-13b-chat-hf-ITA_UltraB_final/generation_config.json +The model is bigger than the maximum size per checkpoint (10GB) and is going to be split in 3 checkpoint shards. You can find where each parameters has been saved in the index located at Llamantino-2-13b-chat-hf-ITA_UltraB_final/pytorch_model.bin.index.json. +tokenizer config file saved in Llamantino-2-13b-chat-hf-ITA_UltraB_final/tokenizer_config.json +Special tokens file saved in Llamantino-2-13b-chat-hf-ITA_UltraB_final/special_tokens_map.json +INFO:torch.distributed.elastic.agent.server.api:[default] worker group successfully finished. Waiting 300 seconds for other agents to finish. +INFO:torch.distributed.elastic.agent.server.api:Local worker group finished (SUCCEEDED). Waiting 300 seconds for other agents to finish +The model is bigger than the maximum size per checkpoint (10GB) and is going to be split in 3 checkpoint shards. You can find where each parameters has been saved in the index located at Llamantino-2-13b-chat-hf-ITA_UltraB_final/pytorch_model.bin.index.json. +tokenizer config file saved in Llamantino-2-13b-chat-hf-ITA_UltraB_final/tokenizer_config.json +Special tokens file saved in Llamantino-2-13b-chat-hf-ITA_UltraB_final/special_tokens_map.json +INFO:torch.distributed.elastic.agent.server.api:[default] worker group successfully finished. Waiting 300 seconds for other agents to finish. +INFO:torch.distributed.elastic.agent.server.api:Local worker group finished (SUCCEEDED). Waiting 300 seconds for other agents to finish +The model is bigger than the maximum size per checkpoint (10GB) and is going to be split in 3 checkpoint shards. You can find where each parameters has been saved in the index located at Llamantino-2-13b-chat-hf-ITA_UltraB_final/pytorch_model.bin.index.json. +tokenizer config file saved in Llamantino-2-13b-chat-hf-ITA_UltraB_final/tokenizer_config.json +Special tokens file saved in Llamantino-2-13b-chat-hf-ITA_UltraB_final/special_tokens_map.json +INFO:torch.distributed.elastic.agent.server.api:[default] worker group successfully finished. Waiting 300 seconds for other agents to finish. +INFO:torch.distributed.elastic.agent.server.api:Local worker group finished (SUCCEEDED). Waiting 300 seconds for other agents to finish +INFO:torch.distributed.elastic.agent.server.api:Done waiting for other agents. Elapsed: 0.0009827613830566406 seconds +INFO:torch.distributed.elastic.agent.server.api:Done waiting for other agents. Elapsed: 48.7719452381134 seconds +INFO:torch.distributed.elastic.agent.server.api:Done waiting for other agents. Elapsed: 38.99942111968994 seconds