/leonardo/prod/spack/03/ccsdeploy/hosts/cineca.it/BA/setup-var.sh: riga 61: $(tty): redirezione ambigua Node IP: 10.6.1.54 master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified. WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs: entrypoint : finetune_chat_llama_13B.py min_nodes : 3 max_nodes : 3 nproc_per_node : 4 run_id : 29353 rdzv_backend : c10d rdzv_endpoint : 10.6.1.54:29500 rdzv_configs : {'timeout': 900} max_restarts : 0 monitor_interval : 5 log_dir : None metrics_cfg : {} INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_szpr_trh/29353_fzy75qgg INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python3 INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified. WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs: entrypoint : finetune_chat_llama_13B.py min_nodes : 3 max_nodes : 3 nproc_per_node : 4 run_id : 29353 rdzv_backend : c10d rdzv_endpoint : 10.6.1.54:29500 rdzv_configs : {'timeout': 900} max_restarts : 0 monitor_interval : 5 log_dir : None metrics_cfg : {} INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_9wl7vyc5/29353_wgoojszx INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python3 INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified. WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs: entrypoint : finetune_chat_llama_13B.py min_nodes : 3 max_nodes : 3 nproc_per_node : 4 run_id : 29353 rdzv_backend : c10d rdzv_endpoint : 10.6.1.54:29500 rdzv_configs : {'timeout': 900} max_restarts : 0 monitor_interval : 5 log_dir : None metrics_cfg : {} INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_wpk04vb4/29353_qulo0whx INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python3 INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: restart_count=0 master_addr=lrdn2110-net6-3.leonardo.local master_port=37919 group_rank=0 group_world_size=3 local_ranks=[0, 1, 2, 3] role_ranks=[0, 1, 2, 3] global_ranks=[0, 1, 2, 3] role_world_sizes=[12, 12, 12, 12] global_world_sizes=[12, 12, 12, 12] INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.agent.server.local_elastic_agent:Starting a FileTimerServer with /tmp/watchdog_timer_f92edb7d-4d3d-442d-aad6-058dd23fc6be ... INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: restart_count=0 master_addr=lrdn2110-net6-3.leonardo.local master_port=37919 group_rank=1 group_world_size=3 local_ranks=[0, 1, 2, 3] role_ranks=[4, 5, 6, 7] global_ranks=[4, 5, 6, 7] role_world_sizes=[12, 12, 12, 12] global_world_sizes=[12, 12, 12, 12] INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: restart_count=0 master_addr=lrdn2110-net6-3.leonardo.local master_port=37919 group_rank=2 group_world_size=3 local_ranks=[0, 1, 2, 3] role_ranks=[8, 9, 10, 11] global_ranks=[8, 9, 10, 11] role_world_sizes=[12, 12, 12, 12] global_world_sizes=[12, 12, 12, 12] INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.agent.server.local_elastic_agent:Starting a FileTimerServer with /tmp/watchdog_timer_63960e88-d48a-42f3-ba13-7de9376a3825 ... INFO:torch.distributed.elastic.agent.server.local_elastic_agent:FileTimerServer started INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_szpr_trh/29353_fzy75qgg/attempt_0/0/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_szpr_trh/29353_fzy75qgg/attempt_0/1/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_szpr_trh/29353_fzy75qgg/attempt_0/2/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker3 reply file to: /tmp/torchelastic_szpr_trh/29353_fzy75qgg/attempt_0/3/error.json INFO:torch.distributed.elastic.agent.server.local_elastic_agent:Starting a FileTimerServer with /tmp/watchdog_timer_b96ab573-7472-4b9b-bb4a-dd5e32bb7a2d ... INFO:torch.distributed.elastic.agent.server.local_elastic_agent:FileTimerServer started INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_9wl7vyc5/29353_wgoojszx/attempt_0/0/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_9wl7vyc5/29353_wgoojszx/attempt_0/1/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_9wl7vyc5/29353_wgoojszx/attempt_0/2/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker3 reply file to: /tmp/torchelastic_9wl7vyc5/29353_wgoojszx/attempt_0/3/error.json INFO:torch.distributed.elastic.agent.server.local_elastic_agent:FileTimerServer started INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_wpk04vb4/29353_qulo0whx/attempt_0/0/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_wpk04vb4/29353_qulo0whx/attempt_0/1/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_wpk04vb4/29353_qulo0whx/attempt_0/2/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker3 reply file to: /tmp/torchelastic_wpk04vb4/29353_qulo0whx/attempt_0/3/error.json INFO WARN Clearing GPU cache for all ranks --> Running with torch dist debug set to detail Using device: cuda ################{'': 'cuda:0'}################ INFO WARN Using device: Using device: Using device: Clearing GPU cache for all ranks Using device: cuda ################{'': 'cuda:1'}################ cuda ################{'': 'cuda:2'}################ cuda ################{'': 'cuda:3'}################ cuda ################{'': 'cuda:0'}################ INFO WARN INFO WARN INFO WARN INFO WARN Clearing GPU cache for all ranks Using device: cuda ################{'': 'cuda:0'}################ Using device: cuda ################{'': 'cuda:1'}################ INFO WARN Using device: cuda ################{'': 'cuda:2'}################ Using device: cuda ################{'': 'cuda:3'}################ Using device: cuda ################{'': 'cuda:1'}################ Using device: cuda ################{'': 'cuda:2'}################ Using device: cuda ################{'': 'cuda:3'}################ INFO WARN INFO WARN INFO WARN INFO WARN INFO WARN loading configuration file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/config.json Model config LlamaConfig { "_name_or_path": "/leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA", "architectures": [ "LlamaForCausalLM" ], "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 5120, "initializer_range": 0.02, "intermediate_size": 13824, "max_position_embeddings": 4096, "model_type": "llama", "num_attention_heads": 40, "num_hidden_layers": 40, "num_key_value_heads": 40, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.31.0", "use_cache": true, "vocab_size": 32000 } loading configuration file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/config.json Model config LlamaConfig { loading configuration file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/config.json Model config LlamaConfig { "_name_or_path": "/leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA", "architectures": [ "LlamaForCausalLM" ], "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 5120, "initializer_range": 0.02, "intermediate_size": 13824, "max_position_embeddings": 4096, "model_type": "llama", "num_attention_heads": 40, "num_hidden_layers": 40, "num_key_value_heads": 40, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.31.0", "use_cache": true, "vocab_size": 32000 } loading configuration file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/config.json Model config LlamaConfig { loading configuration file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/config.json Model config LlamaConfig { "_name_or_path": "/leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA", "architectures": [ "LlamaForCausalLM" ], "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 5120, "initializer_range": 0.02, "intermediate_size": 13824, "max_position_embeddings": 4096, "model_type": "llama", "num_attention_heads": 40, "num_hidden_layers": 40, "num_key_value_heads": 40, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.31.0", "use_cache": true, "vocab_size": 32000 } loading configuration file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/config.json Model config LlamaConfig { "_name_or_path": "/leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA", "architectures": [ "LlamaForCausalLM" ], "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 5120, "initializer_range": 0.02, "intermediate_size": 13824, "max_position_embeddings": 4096, "model_type": "llama", "num_attention_heads": 40, "num_hidden_layers": 40, "num_key_value_heads": 40, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.31.0", "use_cache": true, "vocab_size": 32000 } "_name_or_path": "/leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA", "architectures": [ "LlamaForCausalLM" ], "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 5120, "initializer_range": 0.02, "intermediate_size": 13824, "max_position_embeddings": 4096, "model_type": "llama", "num_attention_heads": 40, "num_hidden_layers": 40, "num_key_value_heads": 40, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.31.0", "use_cache": true, "vocab_size": 32000 } loading configuration file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/config.json Model config LlamaConfig { "_name_or_path": "/leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA", "architectures": [ "LlamaForCausalLM" ], "_name_or_path": "/leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA", "architectures": [ "LlamaForCausalLM" ], "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 5120, "initializer_range": 0.02, "intermediate_size": 13824, "max_position_embeddings": 4096, "model_type": "llama", "num_attention_heads": 40, "num_hidden_layers": 40, "num_key_value_heads": 40, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.31.0", "use_cache": true, "vocab_size": 32000 } loading configuration file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/config.json Model config LlamaConfig { "_name_or_path": "/leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA", "architectures": [ "LlamaForCausalLM" ], loading configuration file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/config.json Model config LlamaConfig { "_name_or_path": "/leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA", "architectures": [ "LlamaForCausalLM" ], "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 5120, "initializer_range": 0.02, "intermediate_size": 13824, "max_position_embeddings": 4096, "model_type": "llama", "num_attention_heads": 40, "num_hidden_layers": 40, "num_key_value_heads": 40, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.31.0", "use_cache": true, "vocab_size": 32000 } "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 5120, "initializer_range": 0.02, "intermediate_size": 13824, "max_position_embeddings": 4096, "model_type": "llama", "num_attention_heads": 40, "num_hidden_layers": 40, "num_key_value_heads": 40, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.31.0", "use_cache": true, "vocab_size": 32000 } "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 5120, "initializer_range": 0.02, "intermediate_size": 13824, "max_position_embeddings": 4096, "model_type": "llama", "num_attention_heads": 40, "num_hidden_layers": 40, "num_key_value_heads": 40, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.31.0", "use_cache": true, "vocab_size": 32000 } loading configuration file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/config.json Model config LlamaConfig { "_name_or_path": "/leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA", "architectures": [ "LlamaForCausalLM" ], "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 5120, "initializer_range": 0.02, "intermediate_size": 13824, "max_position_embeddings": 4096, "model_type": "llama", "num_attention_heads": 40, "num_hidden_layers": 40, "num_key_value_heads": 40, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.31.0", "use_cache": true, "vocab_size": 32000 } loading configuration file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/config.json Model config LlamaConfig { "_name_or_path": "/leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA", "architectures": [ "LlamaForCausalLM" ], "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 5120, "initializer_range": 0.02, "intermediate_size": 13824, "max_position_embeddings": 4096, "model_type": "llama", "num_attention_heads": 40, "num_hidden_layers": 40, "num_key_value_heads": 40, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.31.0", "use_cache": true, "vocab_size": 32000 } loading configuration file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/config.json Model config LlamaConfig { "_name_or_path": "/leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA", "architectures": [ "LlamaForCausalLM" ], "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 5120, "initializer_range": 0.02, "intermediate_size": 13824, "max_position_embeddings": 4096, "model_type": "llama", "num_attention_heads": 40, "num_hidden_layers": 40, "num_key_value_heads": 40, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.31.0", "use_cache": true, "vocab_size": 32000 } Detected PIL version 10.1.0 Detected PIL version 10.1.0 Detected PIL version 10.1.0 Detected PIL version 10.1.0 Detected PIL version 10.1.0 Detected PIL version 10.1.0 Detected PIL version 10.1.0 Detected PIL version 10.1.0 Detected PIL version 10.1.0 Detected PIL version 10.1.0 Detected PIL version 10.1.0 Detected PIL version 10.1.0 Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in 8-bit or 4-bit. Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass torch_dtype=torch.float16 to remove this warning. Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in 8-bit or 4-bit. Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass torch_dtype=torch.float16 to remove this warning. Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in 8-bit or 4-bit. Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass torch_dtype=torch.float16 to remove this warning. Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in 8-bit or 4-bit. Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass torch_dtype=torch.float16 to remove this warning. Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in 8-bit or 4-bit. Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass torch_dtype=torch.float16 to remove this warning. Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in 8-bit or 4-bit. Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass torch_dtype=torch.float16 to remove this warning. Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in 8-bit or 4-bit. Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass torch_dtype=torch.float16 to remove this warning. Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in 8-bit or 4-bit. Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass torch_dtype=torch.float16 to remove this warning. loading weights file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/model.safetensors.index.json loading weights file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/model.safetensors.index.json loading weights file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/model.safetensors.index.json loading weights file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/model.safetensors.index.json Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in 8-bit or 4-bit. Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass torch_dtype=torch.float16 to remove this warning. Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in 8-bit or 4-bit. Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass torch_dtype=torch.float16 to remove this warning. Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in 8-bit or 4-bit. Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass torch_dtype=torch.float16 to remove this warning. Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in 8-bit or 4-bit. Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass torch_dtype=torch.float16 to remove this warning. loading weights file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/model.safetensors.index.json loading weights file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/model.safetensors.index.json loading weights file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/model.safetensors.index.json loading weights file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/model.safetensors.index.json loading weights file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/model.safetensors.index.json loading weights file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/model.safetensors.index.json loading weights file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/model.safetensors.index.json loading weights file /leonardo_work/IscrC_fineNLP/llamantino_chat/13B_MODELS_final/Llamantino-2-13b-chat-hf-ITA/model.safetensors.index.json Instantiating LlamaForCausalLM model under default dtype torch.float16. Instantiating LlamaForCausalLM model under default dtype torch.float16. Instantiating LlamaForCausalLM model under default dtype torch.float16. Generate config GenerationConfig { "_from_model_config": true, "bos_token_id": 1, "eos_token_id": 2, "pad_token_id": 0, "transformers_version": "4.31.0" } Instantiating LlamaForCausalLM model under default dtype torch.float16. Generate config GenerationConfig { "_from_model_config": true, "bos_token_id": 1, "eos_token_id": 2, "pad_token_id": 0, "transformers_version": "4.31.0" } Instantiating LlamaForCausalLM model under default dtype torch.float16. Instantiating LlamaForCausalLM model under default dtype torch.float16. Instantiating LlamaForCausalLM model under default dtype torch.float16. Instantiating LlamaForCausalLM model under default dtype torch.float16. Generate config GenerationConfig { "_from_model_config": true, "bos_token_id": 1, "eos_token_id": 2, "pad_token_id": 0, "transformers_version": "4.31.0" } Generate config GenerationConfig { "_from_model_config": true, "bos_token_id": 1, "eos_token_id": 2, "pad_token_id": 0, "transformers_version": "4.31.0" } Generate config GenerationConfig { "_from_model_config": true, "bos_token_id": 1, "eos_token_id": 2, "pad_token_id": 0, "transformers_version": "4.31.0" } Generate config GenerationConfig { "_from_model_config": true, "bos_token_id": 1, "eos_token_id": 2, "pad_token_id": 0, "transformers_version": "4.31.0" } Instantiating LlamaForCausalLM model under default dtype torch.float16. Instantiating LlamaForCausalLM model under default dtype torch.float16. Instantiating LlamaForCausalLM model under default dtype torch.float16. Instantiating LlamaForCausalLM model under default dtype torch.float16. Generate config GenerationConfig { "_from_model_config": true, "bos_token_id": 1, "eos_token_id": 2, "pad_token_id": 0, "transformers_version": "4.31.0" } Generate config GenerationConfig { "_from_model_config": true, "bos_token_id": 1, "eos_token_id": 2, "pad_token_id": 0, "transformers_version": "4.31.0" } Generate config GenerationConfig { "_from_model_config": true, "bos_token_id": 1, "eos_token_id": 2, "pad_token_id": 0, "transformers_version": "4.31.0" } Generate config GenerationConfig { "_from_model_config": true, "bos_token_id": 1, "eos_token_id": 2, "pad_token_id": 0, "transformers_version": "4.31.0" } Generate config GenerationConfig { "_from_model_config": true, "bos_token_id": 1, "eos_token_id": 2, "pad_token_id": 0, "transformers_version": "4.31.0" } Generate config GenerationConfig { "_from_model_config": true, "bos_token_id": 1, "eos_token_id": 2, "pad_token_id": 0, "transformers_version": "4.31.0" } Detected 8-bit loading: activating 8-bit loading for this model Detected 8-bit loading: activating 8-bit loading for this model Detected 8-bit loading: activating 8-bit loading for this model Detected 8-bit loading: activating 8-bit loading for this model Detected 8-bit loading: activating 8-bit loading for this model Detected 8-bit loading: activating 8-bit loading for this model Detected 8-bit loading: activating 8-bit loading for this model Detected 8-bit loading: activating 8-bit loading for this model Detected 8-bit loading: activating 8-bit loading for this model Detected 8-bit loading: activating 8-bit loading for this model Detected 8-bit loading: activating 8-bit loading for this model Detected 8-bit loading: activating 8-bit loading for this model Loading checkpoint shards: 0%| | 0/3 [00:00 to the pad_token key of the tokenizer Assigning to the pad_token key of the tokenizer Assigning to the pad_token key of the tokenizer Assigning to the pad_token key of the tokenizer Loading checkpoint shards: 0%| | 0/3 [00:00 to the pad_token key of the tokenizer Assigning to the pad_token key of the tokenizer Assigning to the pad_token key of the tokenizer Assigning to the pad_token key of the tokenizer Dataset({ features: ['id', 'data'], num_rows: 512837 }) Dataset({ features: ['id', 'data'], num_rows: 512837 }) Dataset({ features: ['id', 'data'], num_rows: 512837 }) Dataset({ features: ['text'], num_rows: 512837 }) NVIDIA PG506-242 cuda:3 Allocated: 7.4 GB Dataset({ features: ['text'], num_rows: 512837 }) NVIDIA PG506-243 cuda:1 Allocated: 7.4 GB PyTorch: setting up devices PyTorch: setting up devices /leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/peft/utils/other.py:102: FutureWarning: prepare_model_for_int8_training is deprecated and will be removed in a future version. Use prepare_model_for_kbit_training instead. warnings.warn( /leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/peft/utils/other.py:102: FutureWarning: prepare_model_for_int8_training is deprecated and will be removed in a future version. Use prepare_model_for_kbit_training instead. warnings.warn( Dataset({ features: ['text'], num_rows: 512837 }) NVIDIA PG506-242 cuda:2 Allocated: 7.4 GB PyTorch: setting up devices /leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/peft/utils/other.py:102: FutureWarning: prepare_model_for_int8_training is deprecated and will be removed in a future version. Use prepare_model_for_kbit_training instead. warnings.warn( Dataset({ features: ['id', 'data'], num_rows: 512837 }) Dataset({ features: ['text'], num_rows: 512837 }) NVIDIA PG506-243 cuda:0 Allocated: 7.4 GB PyTorch: setting up devices /leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/peft/utils/other.py:102: FutureWarning: prepare_model_for_int8_training is deprecated and will be removed in a future version. Use prepare_model_for_kbit_training instead. warnings.warn( Dataset({ features: ['id', 'data'], num_rows: 512837 }) Dataset({ features: ['text'], num_rows: 512837 }) NVIDIA PG506-242 cuda:3 Allocated: 7.4 GB PyTorch: setting up devices /leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/peft/utils/other.py:102: FutureWarning: prepare_model_for_int8_training is deprecated and will be removed in a future version. Use prepare_model_for_kbit_training instead. warnings.warn( Dataset({ features: ['id', 'data'], num_rows: 512837 }) Dataset({ features: ['text'], num_rows: 512837 }) NVIDIA PG506-243 cuda:0 Allocated: 7.4 GB PyTorch: setting up devices /leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/peft/utils/other.py:102: FutureWarning: prepare_model_for_int8_training is deprecated and will be removed in a future version. Use prepare_model_for_kbit_training instead. warnings.warn( Dataset({ features: ['id', 'data'], num_rows: 512837 }) Dataset({ features: ['text'], num_rows: 512837 }) NVIDIA PG506-243 cuda:1 Allocated: 7.4 GB PyTorch: setting up devices /leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/peft/utils/other.py:102: FutureWarning: prepare_model_for_int8_training is deprecated and will be removed in a future version. Use prepare_model_for_kbit_training instead. warnings.warn( Dataset({ features: ['id', 'data'], num_rows: 512837 }) Dataset({ features: ['text'], num_rows: 512837 }) NVIDIA PG506-242 cuda:2 Allocated: 7.4 GB PyTorch: setting up devices /leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/peft/utils/other.py:102: FutureWarning: prepare_model_for_int8_training is deprecated and will be removed in a future version. Use prepare_model_for_kbit_training instead. warnings.warn( NVIDIA PG506-242 cuda:3 Allocated: 8.2 GB /leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:227: UserWarning: You passed `packing=True` to the SFTTrainer, and you are training your model with `max_steps` strategy. The dataset will be iterated until the `max_steps` are reached. warnings.warn( NVIDIA PG506-242 cuda:3 Allocated: 8.2 GB /leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:227: UserWarning: You passed `packing=True` to the SFTTrainer, and you are training your model with `max_steps` strategy. The dataset will be iterated until the `max_steps` are reached. warnings.warn( NVIDIA PG506-243 cuda:1 Allocated: 8.2 GB /leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:227: UserWarning: You passed `packing=True` to the SFTTrainer, and you are training your model with `max_steps` strategy. The dataset will be iterated until the `max_steps` are reached. warnings.warn( NVIDIA PG506-242 cuda:2 Allocated: 8.2 GB /leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:227: UserWarning: You passed `packing=True` to the SFTTrainer, and you are training your model with `max_steps` strategy. The dataset will be iterated until the `max_steps` are reached. warnings.warn( You have loaded a model on multiple GPUs. `is_model_parallel` attribute will be force-set to `True` to avoid any unexpected behavior such as device placement mismatching. The model is loaded in 8-bit precision. To train this model you need to add additional modules inside the model such as adapters using `peft` library and freeze the model weights. Please check the examples in https://github.com/huggingface/peft for more details. max_steps is given, it will override any value given in num_train_epochs NVIDIA PG506-243 cuda:0 Allocated: 8.2 GB /leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:227: UserWarning: You passed `packing=True` to the SFTTrainer, and you are training your model with `max_steps` strategy. The dataset will be iterated until the `max_steps` are reached. warnings.warn( Currently training with a batch size of: 8 NCCL version 2.14.3+cuda11.7 You have loaded a model on multiple GPUs. `is_model_parallel` attribute will be force-set to `True` to avoid any unexpected behavior such as device placement mismatching. The model is loaded in 8-bit precision. To train this model you need to add additional modules inside the model such as adapters using `peft` library and freeze the model weights. Please check the examples in https://github.com/huggingface/peft for more details. max_steps is given, it will override any value given in num_train_epochs NVIDIA PG506-243 cuda:0 Allocated: 8.2 GB /leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:227: UserWarning: You passed `packing=True` to the SFTTrainer, and you are training your model with `max_steps` strategy. The dataset will be iterated until the `max_steps` are reached. warnings.warn( Currently training with a batch size of: 8 NVIDIA PG506-243 cuda:1 Allocated: 8.2 GB /leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:227: UserWarning: You passed `packing=True` to the SFTTrainer, and you are training your model with `max_steps` strategy. The dataset will be iterated until the `max_steps` are reached. warnings.warn( NVIDIA PG506-242 cuda:2 Allocated: 8.2 GB /leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:227: UserWarning: You passed `packing=True` to the SFTTrainer, and you are training your model with `max_steps` strategy. The dataset will be iterated until the `max_steps` are reached. warnings.warn( Loading checkpoint shards: 0%| | 0/3 [00:00 to the pad_token key of the tokenizer Assigning to the pad_token key of the tokenizer Assigning to the pad_token key of the tokenizer Assigning to the pad_token key of the tokenizer Dataset({ features: ['id', 'data'], num_rows: 512837 }) Dataset({ features: ['id', 'data'], num_rows: 512837 }) Dataset({ features: ['text'], num_rows: 512837 }) NVIDIA PG506-242 cuda:2 Allocated: 7.4 GB PyTorch: setting up devices /leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/peft/utils/other.py:102: FutureWarning: prepare_model_for_int8_training is deprecated and will be removed in a future version. Use prepare_model_for_kbit_training instead. warnings.warn( Dataset({ features: ['text'], num_rows: 512837 }) NVIDIA PG506-243 cuda:0 Allocated: 7.4 GB PyTorch: setting up devices /leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/peft/utils/other.py:102: FutureWarning: prepare_model_for_int8_training is deprecated and will be removed in a future version. Use prepare_model_for_kbit_training instead. warnings.warn( Dataset({ features: ['id', 'data'], num_rows: 512837 }) Dataset({ features: ['text'], num_rows: 512837 }) NVIDIA PG506-243 cuda:1 Allocated: 7.4 GB PyTorch: setting up devices /leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/peft/utils/other.py:102: FutureWarning: prepare_model_for_int8_training is deprecated and will be removed in a future version. Use prepare_model_for_kbit_training instead. warnings.warn( Dataset({ features: ['id', 'data'], num_rows: 512837 }) Dataset({ features: ['text'], num_rows: 512837 }) NVIDIA PG506-242 cuda:3 Allocated: 7.4 GB PyTorch: setting up devices /leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/peft/utils/other.py:102: FutureWarning: prepare_model_for_int8_training is deprecated and will be removed in a future version. Use prepare_model_for_kbit_training instead. warnings.warn( You have loaded a model on multiple GPUs. `is_model_parallel` attribute will be force-set to `True` to avoid any unexpected behavior such as device placement mismatching. The model is loaded in 8-bit precision. To train this model you need to add additional modules inside the model such as adapters using `peft` library and freeze the model weights. Please check the examples in https://github.com/huggingface/peft for more details. max_steps is given, it will override any value given in num_train_epochs NVIDIA PG506-243 cuda:0 Allocated: 8.2 GB /leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:227: UserWarning: You passed `packing=True` to the SFTTrainer, and you are training your model with `max_steps` strategy. The dataset will be iterated until the `max_steps` are reached. warnings.warn( NVIDIA PG506-242 cuda:2 Allocated: 8.2 GB /leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:227: UserWarning: You passed `packing=True` to the SFTTrainer, and you are training your model with `max_steps` strategy. The dataset will be iterated until the `max_steps` are reached. warnings.warn( Currently training with a batch size of: 8 NVIDIA PG506-243 cuda:1 Allocated: 8.2 GB /leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:227: UserWarning: You passed `packing=True` to the SFTTrainer, and you are training your model with `max_steps` strategy. The dataset will be iterated until the `max_steps` are reached. warnings.warn( NVIDIA PG506-242 cuda:3 Allocated: 8.2 GB /leonardo/home/userexternal/mpoligna/.local/lib/python3.10/site-packages/trl/trainer/sft_trainer.py:227: UserWarning: You passed `packing=True` to the SFTTrainer, and you are training your model with `max_steps` strategy. The dataset will be iterated until the `max_steps` are reached. warnings.warn( ***** Running training ***** Num examples = 512,837 Num Epochs = 3 Instantaneous batch size per device = 8 Total train batch size (w. parallel, distributed & accumulation) = 96 Gradient Accumulation steps = 1 Total optimization steps = 15,000 Number of trainable parameters = 52,428,800 ***** Running training ***** Num examples = 512,837 Num Epochs = 3 Instantaneous batch size per device = 8 Total train batch size (w. parallel, distributed & accumulation) = 96 Gradient Accumulation steps = 1 Total optimization steps = 15,000 Number of trainable parameters = 52,428,800 ***** Running training ***** Num examples = 512,837 Num Epochs = 3 Instantaneous batch size per device = 8 Total train batch size (w. parallel, distributed & accumulation) = 96 Gradient Accumulation steps = 1 Total optimization steps = 15,000 Number of trainable parameters = 52,428,800 0%| | 0/15000 [00:00 to the pad_token key of the tokenizer Configuration saved in Llamantino-2-13b-chat-hf-ITA_UltraB_final/config.json Configuration saved in Llamantino-2-13b-chat-hf-ITA_UltraB_final/generation_config.json loading file tokenizer.model loading file tokenizer.json loading file added_tokens.json loading file special_tokens_map.json loading file tokenizer_config.json Assigning to the pad_token key of the tokenizer Configuration saved in Llamantino-2-13b-chat-hf-ITA_UltraB_final/config.json Configuration saved in Llamantino-2-13b-chat-hf-ITA_UltraB_final/generation_config.json loading file tokenizer.model loading file tokenizer.json loading file added_tokens.json loading file special_tokens_map.json loading file tokenizer_config.json Assigning to the pad_token key of the tokenizer Configuration saved in Llamantino-2-13b-chat-hf-ITA_UltraB_final/config.json Configuration saved in Llamantino-2-13b-chat-hf-ITA_UltraB_final/generation_config.json The model is bigger than the maximum size per checkpoint (10GB) and is going to be split in 3 checkpoint shards. You can find where each parameters has been saved in the index located at Llamantino-2-13b-chat-hf-ITA_UltraB_final/model.safetensors.index.json. Configuration saved in Llamantino-2-13b-chat-hf-ITA_UltraB_final/config.json Configuration saved in Llamantino-2-13b-chat-hf-ITA_UltraB_final/generation_config.json The model is bigger than the maximum size per checkpoint (10GB) and is going to be split in 3 checkpoint shards. You can find where each parameters has been saved in the index located at Llamantino-2-13b-chat-hf-ITA_UltraB_final/model.safetensors.index.json. Configuration saved in Llamantino-2-13b-chat-hf-ITA_UltraB_final/config.json Configuration saved in Llamantino-2-13b-chat-hf-ITA_UltraB_final/generation_config.json The model is bigger than the maximum size per checkpoint (10GB) and is going to be split in 3 checkpoint shards. You can find where each parameters has been saved in the index located at Llamantino-2-13b-chat-hf-ITA_UltraB_final/model.safetensors.index.json. Configuration saved in Llamantino-2-13b-chat-hf-ITA_UltraB_final/config.json Configuration saved in Llamantino-2-13b-chat-hf-ITA_UltraB_final/generation_config.json The model is bigger than the maximum size per checkpoint (10GB) and is going to be split in 3 checkpoint shards. You can find where each parameters has been saved in the index located at Llamantino-2-13b-chat-hf-ITA_UltraB_final/pytorch_model.bin.index.json. tokenizer config file saved in Llamantino-2-13b-chat-hf-ITA_UltraB_final/tokenizer_config.json Special tokens file saved in Llamantino-2-13b-chat-hf-ITA_UltraB_final/special_tokens_map.json INFO:torch.distributed.elastic.agent.server.api:[default] worker group successfully finished. Waiting 300 seconds for other agents to finish. INFO:torch.distributed.elastic.agent.server.api:Local worker group finished (SUCCEEDED). Waiting 300 seconds for other agents to finish The model is bigger than the maximum size per checkpoint (10GB) and is going to be split in 3 checkpoint shards. You can find where each parameters has been saved in the index located at Llamantino-2-13b-chat-hf-ITA_UltraB_final/pytorch_model.bin.index.json. tokenizer config file saved in Llamantino-2-13b-chat-hf-ITA_UltraB_final/tokenizer_config.json Special tokens file saved in Llamantino-2-13b-chat-hf-ITA_UltraB_final/special_tokens_map.json INFO:torch.distributed.elastic.agent.server.api:[default] worker group successfully finished. Waiting 300 seconds for other agents to finish. INFO:torch.distributed.elastic.agent.server.api:Local worker group finished (SUCCEEDED). Waiting 300 seconds for other agents to finish The model is bigger than the maximum size per checkpoint (10GB) and is going to be split in 3 checkpoint shards. You can find where each parameters has been saved in the index located at Llamantino-2-13b-chat-hf-ITA_UltraB_final/pytorch_model.bin.index.json. tokenizer config file saved in Llamantino-2-13b-chat-hf-ITA_UltraB_final/tokenizer_config.json Special tokens file saved in Llamantino-2-13b-chat-hf-ITA_UltraB_final/special_tokens_map.json INFO:torch.distributed.elastic.agent.server.api:[default] worker group successfully finished. Waiting 300 seconds for other agents to finish. INFO:torch.distributed.elastic.agent.server.api:Local worker group finished (SUCCEEDED). Waiting 300 seconds for other agents to finish INFO:torch.distributed.elastic.agent.server.api:Done waiting for other agents. Elapsed: 0.0009827613830566406 seconds INFO:torch.distributed.elastic.agent.server.api:Done waiting for other agents. Elapsed: 48.7719452381134 seconds INFO:torch.distributed.elastic.agent.server.api:Done waiting for other agents. Elapsed: 38.99942111968994 seconds