[INFO|per_model_mediate|p_model_medi.py:147] 2024-05-20 16:23:49,450 > Load FP Model [INFO|_get_config_dict|configuration_utils.py:724] 2024-05-20 16:23:49,451 > loading configuration file /raid/LLM/llama3-8b/config.json [INFO|from_dict|configuration_utils.py:789] 2024-05-20 16:23:49,451 > Model config LlamaConfig { "_name_or_path": "/raid/LLM/llama3-8b", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "eos_token_id": 128001, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 8192, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "rope_theta": 500000.0, "tie_word_embeddings": false, "torch_dtype": "float32", "transformers_version": "4.40.2", "use_cache": true, "vocab_size": 128256 } [INFO|from_pretrained|modeling_utils.py:3426] 2024-05-20 16:23:49,468 > loading weights file /raid/LLM/llama3-8b/model.safetensors.index.json [INFO|_set_default_torch_dtype|modeling_utils.py:1494] 2024-05-20 16:23:49,468 > Instantiating LlamaForCausalLM model under default dtype torch.float32. [INFO|from_dict|configuration_utils.py:928] 2024-05-20 16:23:49,469 > Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128001 } [INFO|_load_pretrained_model|modeling_utils.py:4170] 2024-05-20 16:24:07,162 > All model checkpoint weights were used when initializing LlamaForCausalLM. [INFO|_load_pretrained_model|modeling_utils.py:4178] 2024-05-20 16:24:07,162 > All the weights of LlamaForCausalLM were initialized from the model checkpoint at /raid/LLM/llama3-8b. If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. [INFO|from_pretrained|configuration_utils.py:881] 2024-05-20 16:24:07,165 > loading configuration file /raid/LLM/llama3-8b/generation_config.json [INFO|from_dict|configuration_utils.py:928] 2024-05-20 16:24:07,165 > Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128001 } [INFO|per_model_mediate|p_model_medi.py:163] 2024-05-20 16:24:07,367 > Load Tokenizer [INFO|from_pretrained|tokenization_utils_base.py:2085] 2024-05-20 16:24:07,369 > loading file tokenizer.json [INFO|from_pretrained|tokenization_utils_base.py:2085] 2024-05-20 16:24:07,369 > loading file added_tokens.json [INFO|from_pretrained|tokenization_utils_base.py:2085] 2024-05-20 16:24:07,369 > loading file special_tokens_map.json [INFO|from_pretrained|tokenization_utils_base.py:2085] 2024-05-20 16:24:07,369 > loading file tokenizer_config.json [WARNING|warning_advice|logging.py:314] 2024-05-20 16:24:07,727 > Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [INFO|per_model_mediate|p_model_medi.py:189] 2024-05-20 16:24:21,579 > Calibration datset is loaded [INFO|per_model_mediate|p_model_medi.py:190] 2024-05-20 16:24:21,579 > Samples torch.Size([320, 512]) [INFO|per_model_mediate|p_model_medi.py:191] 2024-05-20 16:24:21,579 > Samples Mask torch.Size([320, 512]) [INFO|per_model_mediate|p_model_medi.py:200] 2024-05-20 16:24:21,579 > Set Quantizer [INFO|per_model_mediate|p_model_medi.py:205] 2024-05-20 16:24:21,581 > Done [INFO|per_model_mediate|p_model_medi.py:207] 2024-05-20 16:24:21,581 > NoS: 320 [INFO|_get_config_dict|configuration_utils.py:724] 2024-05-20 16:27:29,844 > loading configuration file /raid/LLM/llama3-8b-quip-2bit_ft_full_ref/config.json [INFO|from_dict|configuration_utils.py:789] 2024-05-20 16:27:29,845 > Model config LlamaConfig { "_name_or_path": "/raid/LLM/llama3-8b-quip-2bit_ft_full_ref", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "eos_token_id": 128001, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 8192, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "rope_theta": 500000.0, "tie_word_embeddings": false, "torch_dtype": "float32", "transformers_version": "4.40.2", "use_cache": true, "vocab_size": 128256 } [INFO|from_pretrained|modeling_utils.py:3426] 2024-05-20 16:27:29,846 > loading weights file /raid/LLM/llama3-8b-quip-2bit_ft_full_ref/pytorch_model.bin [INFO|_set_default_torch_dtype|modeling_utils.py:1494] 2024-05-20 16:27:29,882 > Instantiating LlamaForCausalLM model under default dtype torch.float32. [INFO|from_dict|configuration_utils.py:928] 2024-05-20 16:27:29,883 > Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128001 } [INFO|_load_pretrained_model|modeling_utils.py:4170] 2024-05-20 16:27:39,335 > All model checkpoint weights were used when initializing LlamaForCausalLM. [INFO|_load_pretrained_model|modeling_utils.py:4178] 2024-05-20 16:27:39,335 > All the weights of LlamaForCausalLM were initialized from the model checkpoint at /raid/LLM/llama3-8b-quip-2bit_ft_full_ref. If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. [INFO|from_pretrained|configuration_utils.py:881] 2024-05-20 16:27:39,337 > loading configuration file /raid/LLM/llama3-8b-quip-2bit_ft_full_ref/generation_config.json [INFO|from_dict|configuration_utils.py:928] 2024-05-20 16:27:39,338 > Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128001 } [INFO|per_model_mediate|p_model_medi.py:383] 2024-05-20 16:27:42,812 > max device: 2 [INFO|per_model_mediate|p_model_medi.py:384] 2024-05-20 16:27:42,812 > min device: 0 [INFO|per_model_mediate|p_model_medi.py:392] 2024-05-20 16:27:43,207 > Calculate Gap Weights [INFO|per_model_mediate|p_model_medi.py:489] 2024-05-20 16:27:50,717 > Training data: 256 [INFO|per_model_mediate|p_model_medi.py:490] 2024-05-20 16:27:50,717 > Validation data: 64 [INFO|per_model_mediate|p_model_medi.py:734] 2024-05-20 16:32:02,256 > Step 25 - Validation loss: 1.911618947982788 [INFO|per_model_mediate|p_model_medi.py:734] 2024-05-20 16:35:41,736 > Step 50 - Validation loss: 1.8882404565811157 [INFO|per_model_mediate|p_model_medi.py:734] 2024-05-20 16:39:21,533 > Step 75 - Validation loss: 1.8751988410949707 [INFO|per_model_mediate|p_model_medi.py:734] 2024-05-20 16:43:01,374 > Step 100 - Validation loss: 1.857085943222046 [INFO|per_model_mediate|p_model_medi.py:734] 2024-05-20 16:46:41,220 > Step 125 - Validation loss: 1.8588143587112427 [INFO|per_model_mediate|p_model_medi.py:734] 2024-05-20 16:50:20,834 > Step 150 - Validation loss: 1.8729729652404785 [INFO|per_model_mediate|p_model_medi.py:734] 2024-05-20 16:54:00,480 > Step 175 - Validation loss: 1.888564109802246 [INFO|per_model_mediate|p_model_medi.py:734] 2024-05-20 16:57:40,028 > Step 200 - Validation loss: 1.907829999923706 [INFO|per_model_mediate|p_model_medi.py:734] 2024-05-20 17:01:19,614 > Step 225 - Validation loss: 1.928391456604004 [INFO|per_model_mediate|p_model_medi.py:783] 2024-05-20 17:01:19,615 > Dump Logs and Save Best Adapters [INFO||prep.py:184] 2024-05-20 17:01:20,405 > Per model is Done [INFO||prep.py:306] 2024-05-20 17:01:20,405 > Save Adapters [INFO||prep.py:308] 2024-05-20 17:01:20,977 > Get Base Model [INFO||prep.py:310] 2024-05-20 17:01:20,977 > Unwrap Base Model [INFO||prep.py:315] 2024-05-20 17:01:20,995 > Save Tokenizer [INFO|save_pretrained|tokenization_utils_base.py:2488] 2024-05-20 17:01:20,997 > tokenizer config file saved in /raid/LLM/llama3-8b_w2-quip-r64_c4-320_a-w-g_8-1e-4-unfinetuned-20240520_162342/tokenizer_config.json [INFO|save_pretrained|tokenization_utils_base.py:2497] 2024-05-20 17:01:20,997 > Special tokens file saved in /raid/LLM/llama3-8b_w2-quip-r64_c4-320_a-w-g_8-1e-4-unfinetuned-20240520_162342/special_tokens_map.json [INFO||prep.py:342] 2024-05-20 17:01:22,381 > Define a Dummy Model [INFO|_get_config_dict|configuration_utils.py:724] 2024-05-20 17:01:22,382 > loading configuration file /raid/LLM/llama3-8b/config.json [INFO|from_dict|configuration_utils.py:789] 2024-05-20 17:01:22,383 > Model config LlamaConfig { "_name_or_path": "/raid/LLM/llama3-8b", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "eos_token_id": 128001, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 8192, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "rope_theta": 500000.0, "tie_word_embeddings": false, "torch_dtype": "float32", "transformers_version": "4.40.2", "use_cache": true, "vocab_size": 128256 } [INFO|from_pretrained|modeling_utils.py:3426] 2024-05-20 17:01:22,383 > loading weights file /raid/LLM/llama3-8b/model.safetensors.index.json [INFO|_set_default_torch_dtype|modeling_utils.py:1494] 2024-05-20 17:01:22,384 > Instantiating LlamaForCausalLM model under default dtype torch.float32. [INFO|from_dict|configuration_utils.py:928] 2024-05-20 17:01:22,384 > Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128001 } [INFO|_load_pretrained_model|modeling_utils.py:4170] 2024-05-20 17:01:24,676 > All model checkpoint weights were used when initializing LlamaForCausalLM. [INFO|_load_pretrained_model|modeling_utils.py:4178] 2024-05-20 17:01:24,676 > All the weights of LlamaForCausalLM were initialized from the model checkpoint at /raid/LLM/llama3-8b. If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. [INFO|from_pretrained|configuration_utils.py:881] 2024-05-20 17:01:24,679 > loading configuration file /raid/LLM/llama3-8b/generation_config.json [INFO|from_dict|configuration_utils.py:928] 2024-05-20 17:01:24,680 > Generate config GenerationConfig { "bos_token_id": 128000, "eos_token_id": 128001 } [INFO||prep.py:357] 2024-05-20 17:01:24,681 > Overwrite Q weight [INFO||prep.py:359] 2024-05-20 17:01:28,423 > Save Model [INFO|save_pretrained|configuration_utils.py:471] 2024-05-20 17:01:28,424 > Configuration saved in /raid/LLM/llama3-8b_w2-quip-r64_c4-320_a-w-g_8-1e-4-unfinetuned-20240520_162342/config.json [INFO|save_pretrained|configuration_utils.py:697] 2024-05-20 17:01:28,425 > Configuration saved in /raid/LLM/llama3-8b_w2-quip-r64_c4-320_a-w-g_8-1e-4-unfinetuned-20240520_162342/generation_config.json [INFO|save_pretrained|modeling_utils.py:2598] 2024-05-20 17:02:00,528 > The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 7 checkpoint shards. You can find where each parameters has been saved in the index located at /raid/LLM/llama3-8b_w2-quip-r64_c4-320_a-w-g_8-1e-4-unfinetuned-20240520_162342/model.safetensors.index.json.