/leonardo/prod/spack/03/ccsdeploy/hosts/cineca.it/BA/setup-var.sh: riga 61: $(tty): redirezione ambigua Node IP: 10.2.0.16 master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified. WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs: entrypoint : finetune_chat_llama.py min_nodes : 3 max_nodes : 3 nproc_per_node : 4 run_id : 23926 rdzv_backend : c10d rdzv_endpoint : 10.2.0.16:29500 rdzv_configs : {'timeout': 900} max_restarts : 0 monitor_interval : 5 log_dir : None metrics_cfg : {} master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified. WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs: entrypoint : finetune_chat_llama.py min_nodes : 3 max_nodes : 3 nproc_per_node : 4 run_id : 23926 rdzv_backend : c10d rdzv_endpoint : 10.2.0.16:29500 rdzv_configs : {'timeout': 900} max_restarts : 0 monitor_interval : 5 log_dir : None metrics_cfg : {} INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_6c279b0l/23926_xzvcwefg INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python3 INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified. WARNING:torch.distributed.run: ***************************************** Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. ***************************************** INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs: entrypoint : finetune_chat_llama.py min_nodes : 3 max_nodes : 3 nproc_per_node : 4 run_id : 23926 rdzv_backend : c10d rdzv_endpoint : 10.2.0.16:29500 rdzv_configs : {'timeout': 900} max_restarts : 0 monitor_interval : 5 log_dir : None metrics_cfg : {} INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_ucpsup42/23926_iniqycen INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python3 INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_sibvbw07/23926_vrqggxt_ INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python3 INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: restart_count=0 master_addr=lrdn0376-net2-3.leonardo.local master_port=39889 group_rank=0 group_world_size=3 local_ranks=[0, 1, 2, 3] role_ranks=[0, 1, 2, 3] global_ranks=[0, 1, 2, 3] role_world_sizes=[12, 12, 12, 12] global_world_sizes=[12, 12, 12, 12] INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.agent.server.local_elastic_agent:Starting a FileTimerServer with /tmp/watchdog_timer_302a2159-8781-4433-8620-997ce9b9a410 ... INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: restart_count=0 master_addr=lrdn0376-net2-3.leonardo.local master_port=39889 group_rank=2 group_world_size=3 local_ranks=[0, 1, 2, 3] role_ranks=[8, 9, 10, 11] global_ranks=[8, 9, 10, 11] role_world_sizes=[12, 12, 12, 12] global_world_sizes=[12, 12, 12, 12] INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result: restart_count=0 master_addr=lrdn0376-net2-3.leonardo.local master_port=39889 group_rank=1 group_world_size=3 local_ranks=[0, 1, 2, 3] role_ranks=[4, 5, 6, 7] global_ranks=[4, 5, 6, 7] role_world_sizes=[12, 12, 12, 12] global_world_sizes=[12, 12, 12, 12] INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group INFO:torch.distributed.elastic.agent.server.local_elastic_agent:Starting a FileTimerServer with /tmp/watchdog_timer_a18747e3-cd8a-4138-860e-d0d6039f8139 ... INFO:torch.distributed.elastic.agent.server.local_elastic_agent:Starting a FileTimerServer with /tmp/watchdog_timer_021d5f39-514b-4f02-8c7b-b5aa72f8ad84 ... INFO:torch.distributed.elastic.agent.server.local_elastic_agent:FileTimerServer started INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_6c279b0l/23926_xzvcwefg/attempt_0/0/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_6c279b0l/23926_xzvcwefg/attempt_0/1/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_6c279b0l/23926_xzvcwefg/attempt_0/2/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker3 reply file to: /tmp/torchelastic_6c279b0l/23926_xzvcwefg/attempt_0/3/error.json INFO:torch.distributed.elastic.agent.server.local_elastic_agent:FileTimerServer started INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_ucpsup42/23926_iniqycen/attempt_0/0/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_ucpsup42/23926_iniqycen/attempt_0/1/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_ucpsup42/23926_iniqycen/attempt_0/2/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker3 reply file to: /tmp/torchelastic_ucpsup42/23926_iniqycen/attempt_0/3/error.json INFO:torch.distributed.elastic.agent.server.local_elastic_agent:FileTimerServer started INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_sibvbw07/23926_vrqggxt_/attempt_0/0/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_sibvbw07/23926_vrqggxt_/attempt_0/1/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_sibvbw07/23926_vrqggxt_/attempt_0/2/error.json INFO:torch.distributed.elastic.multiprocessing:Setting worker3 reply file to: /tmp/torchelastic_sibvbw07/23926_vrqggxt_/attempt_0/3/error.json Using device: Clearing GPU cache for all ranks Using device: Using device: Using device: cuda ################{'': 'cuda:2'}################ cuda ################{'': 'cuda:3'}################ cuda ################{'': 'cuda:0'}################ cuda ################{'': 'cuda:1'}################ INFO WARN INFO WARN INFO WARN INFO WARN INFO WARN INFO WARN INFO WARN Using device: Using device: Using device: cuda ################{'': 'cuda:3'}################ cuda ################{'': 'cuda:2'}################ cuda ################{'': 'cuda:1'}################ Using device: Using device: Clearing GPU cache for all ranks INFO WARN INFO WARN INFO WARN INFO WARN Clearing GPU cache for all ranks --> Running with torch dist debug set to detail Using device: cuda ################{'': 'cuda:0'}################ INFO WARN Using device: Using device: cuda ################{'': 'cuda:3'}################ cuda ################{'': 'cuda:1'}################ cuda ################{'': 'cuda:0'}################ cuda ################{'': 'cuda:2'}################ loading configuration file /leonardo_work/IscrC_fineNLP/llamantino_chat/7B_MODELS_final/Llamantino-2-7b-chat-hf-ITA/config.json Model config LlamaConfig { "_name_or_path": "/leonardo_work/IscrC_fineNLP/llamantino_chat/7B_MODELS_final/Llamantino-2-7b-chat-hf-ITA", "architectures": [ "LlamaForCausalLM" ], "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 4096, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 32, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.31.0", "use_cache": true, "vocab_size": 32000 } loading configuration file /leonardo_work/IscrC_fineNLP/llamantino_chat/7B_MODELS_final/Llamantino-2-7b-chat-hf-ITA/config.json Model config LlamaConfig { loading configuration file /leonardo_work/IscrC_fineNLP/llamantino_chat/7B_MODELS_final/Llamantino-2-7b-chat-hf-ITA/config.json Model config LlamaConfig { "_name_or_path": "/leonardo_work/IscrC_fineNLP/llamantino_chat/7B_MODELS_final/Llamantino-2-7b-chat-hf-ITA", "architectures": [ "LlamaForCausalLM" ], "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 4096, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 32, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.31.0", "use_cache": true, "vocab_size": 32000 } loading configuration file /leonardo_work/IscrC_fineNLP/llamantino_chat/7B_MODELS_final/Llamantino-2-7b-chat-hf-ITA/config.json Model config LlamaConfig { loading configuration file /leonardo_work/IscrC_fineNLP/llamantino_chat/7B_MODELS_final/Llamantino-2-7b-chat-hf-ITA/config.json Model config LlamaConfig { "_name_or_path": "/leonardo_work/IscrC_fineNLP/llamantino_chat/7B_MODELS_final/Llamantino-2-7b-chat-hf-ITA", "architectures": [ "LlamaForCausalLM" ], "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 4096, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 32, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.31.0", "use_cache": true, "vocab_size": 32000 } loading configuration file /leonardo_work/IscrC_fineNLP/llamantino_chat/7B_MODELS_final/Llamantino-2-7b-chat-hf-ITA/config.json Model config LlamaConfig { "_name_or_path": "/leonardo_work/IscrC_fineNLP/llamantino_chat/7B_MODELS_final/Llamantino-2-7b-chat-hf-ITA", "architectures": [ "LlamaForCausalLM" ], "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 4096, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 32, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.31.0", "use_cache": true, "vocab_size": 32000 } loading configuration file /leonardo_work/IscrC_fineNLP/llamantino_chat/7B_MODELS_final/Llamantino-2-7b-chat-hf-ITA/config.json Model config LlamaConfig { "_name_or_path": "/leonardo_work/IscrC_fineNLP/llamantino_chat/7B_MODELS_final/Llamantino-2-7b-chat-hf-ITA", "architectures": [ "LlamaForCausalLM" ], "_name_or_path": "/leonardo_work/IscrC_fineNLP/llamantino_chat/7B_MODELS_final/Llamantino-2-7b-chat-hf-ITA", "architectures": [ "LlamaForCausalLM" ], "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 4096, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 32, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.31.0", "use_cache": true, "vocab_size": 32000 } loading configuration file /leonardo_work/IscrC_fineNLP/llamantino_chat/7B_MODELS_final/Llamantino-2-7b-chat-hf-ITA/config.json Model config LlamaConfig { "_name_or_path": "/leonardo_work/IscrC_fineNLP/llamantino_chat/7B_MODELS_final/Llamantino-2-7b-chat-hf-ITA", "architectures": [ "LlamaForCausalLM" ], "_name_or_path": "/leonardo_work/IscrC_fineNLP/llamantino_chat/7B_MODELS_final/Llamantino-2-7b-chat-hf-ITA", "architectures": [ "LlamaForCausalLM" ], "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 4096, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 32, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.31.0", "use_cache": true, "vocab_size": 32000 } loading configuration file /leonardo_work/IscrC_fineNLP/llamantino_chat/7B_MODELS_final/Llamantino-2-7b-chat-hf-ITA/config.json Model config LlamaConfig { "_name_or_path": "/leonardo_work/IscrC_fineNLP/llamantino_chat/7B_MODELS_final/Llamantino-2-7b-chat-hf-ITA", "architectures": [ "LlamaForCausalLM" ], "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 4096, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 32, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.31.0", "use_cache": true, "vocab_size": 32000 } "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 4096, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 32, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.31.0", "use_cache": true, "vocab_size": 32000 } "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 4096, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 32, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.31.0", "use_cache": true, "vocab_size": 32000 } loading configuration file /leonardo_work/IscrC_fineNLP/llamantino_chat/7B_MODELS_final/Llamantino-2-7b-chat-hf-ITA/config.json Model config LlamaConfig { "_name_or_path": "/leonardo_work/IscrC_fineNLP/llamantino_chat/7B_MODELS_final/Llamantino-2-7b-chat-hf-ITA", "architectures": [ "LlamaForCausalLM" ], "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 4096, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 32, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.31.0", "use_cache": true, "vocab_size": 32000 } loading configuration file /leonardo_work/IscrC_fineNLP/llamantino_chat/7B_MODELS_final/Llamantino-2-7b-chat-hf-ITA/config.json Model config LlamaConfig { "_name_or_path": "/leonardo_work/IscrC_fineNLP/llamantino_chat/7B_MODELS_final/Llamantino-2-7b-chat-hf-ITA", "architectures": [ "LlamaForCausalLM" ], "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 4096, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 32, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.31.0", "use_cache": true, "vocab_size": 32000 } loading configuration file /leonardo_work/IscrC_fineNLP/llamantino_chat/7B_MODELS_final/Llamantino-2-7b-chat-hf-ITA/config.json Model config LlamaConfig { "_name_or_path": "/leonardo_work/IscrC_fineNLP/llamantino_chat/7B_MODELS_final/Llamantino-2-7b-chat-hf-ITA", "architectures": [ "LlamaForCausalLM" ], "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 4096, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 32, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.31.0", "use_cache": true, "vocab_size": 32000 } Detected PIL version 10.1.0 Detected PIL version 10.1.0 Detected PIL version 10.1.0 Detected PIL version 10.1.0 Detected PIL version 10.1.0 Detected PIL version 10.1.0 Detected PIL version 10.1.0 Detected PIL version 10.1.0 Detected PIL version 10.1.0 Detected PIL version 10.1.0 Detected PIL version 10.1.0 Detected PIL version 10.1.0 Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in 8-bit or 4-bit. Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass torch_dtype=torch.float16 to remove this warning. Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in 8-bit or 4-bit. Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass torch_dtype=torch.float16 to remove this warning. Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in 8-bit or 4-bit. Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass torch_dtype=torch.float16 to remove this warning. Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in 8-bit or 4-bit. Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass torch_dtype=torch.float16 to remove this warning. loading weights file /leonardo_work/IscrC_fineNLP/llamantino_chat/7B_MODELS_final/Llamantino-2-7b-chat-hf-ITA/model.safetensors.index.json loading weights file /leonardo_work/IscrC_fineNLP/llamantino_chat/7B_MODELS_final/Llamantino-2-7b-chat-hf-ITA/model.safetensors.index.json loading weights file /leonardo_work/IscrC_fineNLP/llamantino_chat/7B_MODELS_final/Llamantino-2-7b-chat-hf-ITA/model.safetensors.index.json loading weights file /leonardo_work/IscrC_fineNLP/llamantino_chat/7B_MODELS_final/Llamantino-2-7b-chat-hf-ITA/model.safetensors.index.json Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in 8-bit or 4-bit. Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass torch_dtype=torch.float16 to remove this warning. Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in 8-bit or 4-bit. Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass torch_dtype=torch.float16 to remove this warning. Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in 8-bit or 4-bit. Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass torch_dtype=torch.float16 to remove this warning. Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in 8-bit or 4-bit. Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass torch_dtype=torch.float16 to remove this warning. Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in 8-bit or 4-bit. Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass torch_dtype=torch.float16 to remove this warning. Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in 8-bit or 4-bit. Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass torch_dtype=torch.float16 to remove this warning. Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in 8-bit or 4-bit. Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass torch_dtype=torch.float16 to remove this warning. Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in 8-bit or 4-bit. Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass torch_dtype=torch.float16 to remove this warning. loading weights file /leonardo_work/IscrC_fineNLP/llamantino_chat/7B_MODELS_final/Llamantino-2-7b-chat-hf-ITA/model.safetensors.index.json loading weights file /leonardo_work/IscrC_fineNLP/llamantino_chat/7B_MODELS_final/Llamantino-2-7b-chat-hf-ITA/model.safetensors.index.json loading weights file /leonardo_work/IscrC_fineNLP/llamantino_chat/7B_MODELS_final/Llamantino-2-7b-chat-hf-ITA/model.safetensors.index.json loading weights file /leonardo_work/IscrC_fineNLP/llamantino_chat/7B_MODELS_final/Llamantino-2-7b-chat-hf-ITA/model.safetensors.index.json loading weights file /leonardo_work/IscrC_fineNLP/llamantino_chat/7B_MODELS_final/Llamantino-2-7b-chat-hf-ITA/model.safetensors.index.json loading weights file /leonardo_work/IscrC_fineNLP/llamantino_chat/7B_MODELS_final/Llamantino-2-7b-chat-hf-ITA/model.safetensors.index.json loading weights file /leonardo_work/IscrC_fineNLP/llamantino_chat/7B_MODELS_final/Llamantino-2-7b-chat-hf-ITA/model.safetensors.index.json loading weights file /leonardo_work/IscrC_fineNLP/llamantino_chat/7B_MODELS_final/Llamantino-2-7b-chat-hf-ITA/model.safetensors.index.json Instantiating LlamaForCausalLM model under default dtype torch.float16. Instantiating LlamaForCausalLM model under default dtype torch.float16. Instantiating LlamaForCausalLM model under default dtype torch.float16. Generate config GenerationConfig { "_from_model_config": true, "bos_token_id": 1, "eos_token_id": 2, "pad_token_id": 0, "transformers_version": "4.31.0" } Instantiating LlamaForCausalLM model under default dtype torch.float16. Generate config GenerationConfig { "_from_model_config": true, "bos_token_id": 1, "eos_token_id": 2, "pad_token_id": 0, "transformers_version": "4.31.0" } Generate config GenerationConfig { "_from_model_config": true, "bos_token_id": 1, "eos_token_id": 2, "pad_token_id": 0, "transformers_version": "4.31.0" } Generate config GenerationConfig { "_from_model_config": true, "bos_token_id": 1, "eos_token_id": 2, "pad_token_id": 0, "transformers_version": "4.31.0" } Instantiating LlamaForCausalLM model under default dtype torch.float16. Instantiating LlamaForCausalLM model under default dtype torch.float16. Instantiating LlamaForCausalLM model under default dtype torch.float16. Generate config GenerationConfig { "_from_model_config": true, "bos_token_id": 1, "eos_token_id": 2, "pad_token_id": 0, "transformers_version": "4.31.0" } Generate config GenerationConfig { "_from_model_config": true, "bos_token_id": 1, "eos_token_id": 2, "pad_token_id": 0, "transformers_version": "4.31.0" } Instantiating LlamaForCausalLM model under default dtype torch.float16. Generate config GenerationConfig { "_from_model_config": true, "bos_token_id": 1, "eos_token_id": 2, "pad_token_id": 0, "transformers_version": "4.31.0" } Generate config GenerationConfig { "_from_model_config": true, "bos_token_id": 1, "eos_token_id": 2, "pad_token_id": 0, "transformers_version": "4.31.0" } Instantiating LlamaForCausalLM model under default dtype torch.float16. Instantiating LlamaForCausalLM model under default dtype torch.float16. Instantiating LlamaForCausalLM model under default dtype torch.float16. Generate config GenerationConfig { "_from_model_config": true, "bos_token_id": 1, "eos_token_id": 2, "pad_token_id": 0, "transformers_version": "4.31.0" } Instantiating LlamaForCausalLM model under default dtype torch.float16. Generate config GenerationConfig { "_from_model_config": true, "bos_token_id": 1, "eos_token_id": 2, "pad_token_id": 0, "transformers_version": "4.31.0" } Generate config GenerationConfig { "_from_model_config": true, "bos_token_id": 1, "eos_token_id": 2, "pad_token_id": 0, "transformers_version": "4.31.0" } Generate config GenerationConfig { "_from_model_config": true, "bos_token_id": 1, "eos_token_id": 2, "pad_token_id": 0, "transformers_version": "4.31.0" } Detected 8-bit loading: activating 8-bit loading for this model Detected 8-bit loading: activating 8-bit loading for this model Detected 8-bit loading: activating 8-bit loading for this model Detected 8-bit loading: activating 8-bit loading for this model Detected 8-bit loading: activating 8-bit loading for this model Detected 8-bit loading: activating 8-bit loading for this model Detected 8-bit loading: activating 8-bit loading for this model Detected 8-bit loading: activating 8-bit loading for this model Detected 8-bit loading: activating 8-bit loading for this model Detected 8-bit loading: activating 8-bit loading for this model Detected 8-bit loading: activating 8-bit loading for this model Detected 8-bit loading: activating 8-bit loading for this model Loading checkpoint shards: 0%| | 0/2 [00:00 to the pad_token key of the tokenizer Assigning to the pad_token key of the tokenizer Assigning to the pad_token key of the tokenizer Assigning to the pad_token key of the tokenizer Assigning to the pad_token key of the tokenizer Assigning to the pad_token key of the tokenizer Assigning to the pad_token key of the tokenizer Assigning to the pad_token key of the tokenizer Assigning to the pad_token key of the tokenizer Assigning to the pad_token key of the tokenizer Assigning to the pad_token key of the tokenizer Assigning to the pad_token key of the tokenizer Dataset({ features: ['id', 'data'], num_rows: 512837 }) Dataset({ features: ['id', 'data'], num_rows: 512837 }) Dataset({ features: ['id', 'data'], num_rows: 512837 }) Map (num_proc=4): 0%| | 0/512837 [00:00 to the pad_token key of the tokenizer Assigning to the pad_token key of the tokenizer Configuration saved in Llamantino-2-7b-chat-hf-ITA_UltraB_final/config.json Configuration saved in Llamantino-2-7b-chat-hf-ITA_UltraB_final/config.json Configuration saved in Llamantino-2-7b-chat-hf-ITA_UltraB_final/generation_config.json Configuration saved in Llamantino-2-7b-chat-hf-ITA_UltraB_final/generation_config.json loading file tokenizer.model loading file tokenizer.json loading file added_tokens.json loading file special_tokens_map.json loading file tokenizer_config.json Assigning to the pad_token key of the tokenizer Configuration saved in Llamantino-2-7b-chat-hf-ITA_UltraB_final/config.json Configuration saved in Llamantino-2-7b-chat-hf-ITA_UltraB_final/generation_config.json The model is bigger than the maximum size per checkpoint (10GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at Llamantino-2-7b-chat-hf-ITA_UltraB_final/model.safetensors.index.json. Configuration saved in Llamantino-2-7b-chat-hf-ITA_UltraB_final/config.json Configuration saved in Llamantino-2-7b-chat-hf-ITA_UltraB_final/generation_config.json The model is bigger than the maximum size per checkpoint (10GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at Llamantino-2-7b-chat-hf-ITA_UltraB_final/model.safetensors.index.json. Configuration saved in Llamantino-2-7b-chat-hf-ITA_UltraB_final/config.json Configuration saved in Llamantino-2-7b-chat-hf-ITA_UltraB_final/generation_config.json The model is bigger than the maximum size per checkpoint (10GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at Llamantino-2-7b-chat-hf-ITA_UltraB_final/model.safetensors.index.json. Configuration saved in Llamantino-2-7b-chat-hf-ITA_UltraB_final/config.json Configuration saved in Llamantino-2-7b-chat-hf-ITA_UltraB_final/generation_config.json The model is bigger than the maximum size per checkpoint (10GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at Llamantino-2-7b-chat-hf-ITA_UltraB_final/pytorch_model.bin.index.json. tokenizer config file saved in Llamantino-2-7b-chat-hf-ITA_UltraB_final/tokenizer_config.json Special tokens file saved in Llamantino-2-7b-chat-hf-ITA_UltraB_final/special_tokens_map.json INFO:torch.distributed.elastic.agent.server.api:[default] worker group successfully finished. Waiting 300 seconds for other agents to finish. INFO:torch.distributed.elastic.agent.server.api:Local worker group finished (SUCCEEDED). Waiting 300 seconds for other agents to finish The model is bigger than the maximum size per checkpoint (10GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at Llamantino-2-7b-chat-hf-ITA_UltraB_final/pytorch_model.bin.index.json. tokenizer config file saved in Llamantino-2-7b-chat-hf-ITA_UltraB_final/tokenizer_config.json Special tokens file saved in Llamantino-2-7b-chat-hf-ITA_UltraB_final/special_tokens_map.json INFO:torch.distributed.elastic.agent.server.api:[default] worker group successfully finished. Waiting 300 seconds for other agents to finish. INFO:torch.distributed.elastic.agent.server.api:Local worker group finished (SUCCEEDED). Waiting 300 seconds for other agents to finish The model is bigger than the maximum size per checkpoint (10GB) and is going to be split in 2 checkpoint shards. You can find where each parameters has been saved in the index located at Llamantino-2-7b-chat-hf-ITA_UltraB_final/pytorch_model.bin.index.json. tokenizer config file saved in Llamantino-2-7b-chat-hf-ITA_UltraB_final/tokenizer_config.json Special tokens file saved in Llamantino-2-7b-chat-hf-ITA_UltraB_final/special_tokens_map.json INFO:torch.distributed.elastic.agent.server.api:[default] worker group successfully finished. Waiting 300 seconds for other agents to finish. INFO:torch.distributed.elastic.agent.server.api:Local worker group finished (SUCCEEDED). Waiting 300 seconds for other agents to finish INFO:torch.distributed.elastic.agent.server.api:Done waiting for other agents. Elapsed: 7.127857685089111 seconds INFO:torch.distributed.elastic.agent.server.api:Done waiting for other agents. Elapsed: 0.0009458065032958984 seconds INFO:torch.distributed.elastic.agent.server.api:Done waiting for other agents. Elapsed: 50.04134654998779 seconds