[INFO|per_model_mediate|p_model_medi.py:147] 2024-05-20 16:23:49,450 > Load FP Model
[INFO|_get_config_dict|configuration_utils.py:724] 2024-05-20 16:23:49,451 > loading configuration file /raid/LLM/llama3-8b/config.json
[INFO|from_dict|configuration_utils.py:789] 2024-05-20 16:23:49,451 > Model config LlamaConfig {
  "_name_or_path": "/raid/LLM/llama3-8b",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128001,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 8192,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 500000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "float32",
  "transformers_version": "4.40.2",
  "use_cache": true,
  "vocab_size": 128256
}

[INFO|from_pretrained|modeling_utils.py:3426] 2024-05-20 16:23:49,468 > loading weights file /raid/LLM/llama3-8b/model.safetensors.index.json
[INFO|_set_default_torch_dtype|modeling_utils.py:1494] 2024-05-20 16:23:49,468 > Instantiating LlamaForCausalLM model under default dtype torch.float32.
[INFO|from_dict|configuration_utils.py:928] 2024-05-20 16:23:49,469 > Generate config GenerationConfig {
  "bos_token_id": 128000,
  "eos_token_id": 128001
}

[INFO|_load_pretrained_model|modeling_utils.py:4170] 2024-05-20 16:24:07,162 > All model checkpoint weights were used when initializing LlamaForCausalLM.

[INFO|_load_pretrained_model|modeling_utils.py:4178] 2024-05-20 16:24:07,162 > All the weights of LlamaForCausalLM were initialized from the model checkpoint at /raid/LLM/llama3-8b.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|from_pretrained|configuration_utils.py:881] 2024-05-20 16:24:07,165 > loading configuration file /raid/LLM/llama3-8b/generation_config.json
[INFO|from_dict|configuration_utils.py:928] 2024-05-20 16:24:07,165 > Generate config GenerationConfig {
  "bos_token_id": 128000,
  "eos_token_id": 128001
}

[INFO|per_model_mediate|p_model_medi.py:163] 2024-05-20 16:24:07,367 > Load Tokenizer
[INFO|from_pretrained|tokenization_utils_base.py:2085] 2024-05-20 16:24:07,369 > loading file tokenizer.json
[INFO|from_pretrained|tokenization_utils_base.py:2085] 2024-05-20 16:24:07,369 > loading file added_tokens.json
[INFO|from_pretrained|tokenization_utils_base.py:2085] 2024-05-20 16:24:07,369 > loading file special_tokens_map.json
[INFO|from_pretrained|tokenization_utils_base.py:2085] 2024-05-20 16:24:07,369 > loading file tokenizer_config.json
[WARNING|warning_advice|logging.py:314] 2024-05-20 16:24:07,727 > Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
[INFO|per_model_mediate|p_model_medi.py:189] 2024-05-20 16:24:21,579 > Calibration datset is loaded
[INFO|per_model_mediate|p_model_medi.py:190] 2024-05-20 16:24:21,579 > Samples	torch.Size([320, 512])
[INFO|per_model_mediate|p_model_medi.py:191] 2024-05-20 16:24:21,579 > Samples Mask	torch.Size([320, 512])
[INFO|per_model_mediate|p_model_medi.py:200] 2024-05-20 16:24:21,579 > Set Quantizer
[INFO|per_model_mediate|p_model_medi.py:205] 2024-05-20 16:24:21,581 > Done
[INFO|per_model_mediate|p_model_medi.py:207] 2024-05-20 16:24:21,581 > NoS: 320
[INFO|_get_config_dict|configuration_utils.py:724] 2024-05-20 16:27:29,844 > loading configuration file /raid/LLM/llama3-8b-quip-2bit_ft_full_ref/config.json
[INFO|from_dict|configuration_utils.py:789] 2024-05-20 16:27:29,845 > Model config LlamaConfig {
  "_name_or_path": "/raid/LLM/llama3-8b-quip-2bit_ft_full_ref",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128001,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 8192,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 500000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "float32",
  "transformers_version": "4.40.2",
  "use_cache": true,
  "vocab_size": 128256
}

[INFO|from_pretrained|modeling_utils.py:3426] 2024-05-20 16:27:29,846 > loading weights file /raid/LLM/llama3-8b-quip-2bit_ft_full_ref/pytorch_model.bin
[INFO|_set_default_torch_dtype|modeling_utils.py:1494] 2024-05-20 16:27:29,882 > Instantiating LlamaForCausalLM model under default dtype torch.float32.
[INFO|from_dict|configuration_utils.py:928] 2024-05-20 16:27:29,883 > Generate config GenerationConfig {
  "bos_token_id": 128000,
  "eos_token_id": 128001
}

[INFO|_load_pretrained_model|modeling_utils.py:4170] 2024-05-20 16:27:39,335 > All model checkpoint weights were used when initializing LlamaForCausalLM.

[INFO|_load_pretrained_model|modeling_utils.py:4178] 2024-05-20 16:27:39,335 > All the weights of LlamaForCausalLM were initialized from the model checkpoint at /raid/LLM/llama3-8b-quip-2bit_ft_full_ref.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|from_pretrained|configuration_utils.py:881] 2024-05-20 16:27:39,337 > loading configuration file /raid/LLM/llama3-8b-quip-2bit_ft_full_ref/generation_config.json
[INFO|from_dict|configuration_utils.py:928] 2024-05-20 16:27:39,338 > Generate config GenerationConfig {
  "bos_token_id": 128000,
  "eos_token_id": 128001
}

[INFO|per_model_mediate|p_model_medi.py:383] 2024-05-20 16:27:42,812 > max device: 2
[INFO|per_model_mediate|p_model_medi.py:384] 2024-05-20 16:27:42,812 > min device: 0
[INFO|per_model_mediate|p_model_medi.py:392] 2024-05-20 16:27:43,207 > Calculate Gap Weights
[INFO|per_model_mediate|p_model_medi.py:489] 2024-05-20 16:27:50,717 > Training data: 256
[INFO|per_model_mediate|p_model_medi.py:490] 2024-05-20 16:27:50,717 > Validation data: 64
[INFO|per_model_mediate|p_model_medi.py:734] 2024-05-20 16:32:02,256 > Step 25 - Validation loss: 1.911618947982788 
[INFO|per_model_mediate|p_model_medi.py:734] 2024-05-20 16:35:41,736 > Step 50 - Validation loss: 1.8882404565811157 
[INFO|per_model_mediate|p_model_medi.py:734] 2024-05-20 16:39:21,533 > Step 75 - Validation loss: 1.8751988410949707 
[INFO|per_model_mediate|p_model_medi.py:734] 2024-05-20 16:43:01,374 > Step 100 - Validation loss: 1.857085943222046 
[INFO|per_model_mediate|p_model_medi.py:734] 2024-05-20 16:46:41,220 > Step 125 - Validation loss: 1.8588143587112427 
[INFO|per_model_mediate|p_model_medi.py:734] 2024-05-20 16:50:20,834 > Step 150 - Validation loss: 1.8729729652404785 
[INFO|per_model_mediate|p_model_medi.py:734] 2024-05-20 16:54:00,480 > Step 175 - Validation loss: 1.888564109802246 
[INFO|per_model_mediate|p_model_medi.py:734] 2024-05-20 16:57:40,028 > Step 200 - Validation loss: 1.907829999923706 
[INFO|per_model_mediate|p_model_medi.py:734] 2024-05-20 17:01:19,614 > Step 225 - Validation loss: 1.928391456604004 
[INFO|per_model_mediate|p_model_medi.py:783] 2024-05-20 17:01:19,615 > Dump Logs and Save Best Adapters
[INFO|<module>|prep.py:184] 2024-05-20 17:01:20,405 > Per model is Done
[INFO|<module>|prep.py:306] 2024-05-20 17:01:20,405 > Save Adapters
[INFO|<module>|prep.py:308] 2024-05-20 17:01:20,977 > Get Base Model
[INFO|<module>|prep.py:310] 2024-05-20 17:01:20,977 > Unwrap Base Model
[INFO|<module>|prep.py:315] 2024-05-20 17:01:20,995 > Save Tokenizer
[INFO|save_pretrained|tokenization_utils_base.py:2488] 2024-05-20 17:01:20,997 > tokenizer config file saved in /raid/LLM/llama3-8b_w2-quip-r64_c4-320_a-w-g_8-1e-4-unfinetuned-20240520_162342/tokenizer_config.json
[INFO|save_pretrained|tokenization_utils_base.py:2497] 2024-05-20 17:01:20,997 > Special tokens file saved in /raid/LLM/llama3-8b_w2-quip-r64_c4-320_a-w-g_8-1e-4-unfinetuned-20240520_162342/special_tokens_map.json
[INFO|<module>|prep.py:342] 2024-05-20 17:01:22,381 > Define a Dummy Model
[INFO|_get_config_dict|configuration_utils.py:724] 2024-05-20 17:01:22,382 > loading configuration file /raid/LLM/llama3-8b/config.json
[INFO|from_dict|configuration_utils.py:789] 2024-05-20 17:01:22,383 > Model config LlamaConfig {
  "_name_or_path": "/raid/LLM/llama3-8b",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128001,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 8192,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 500000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "float32",
  "transformers_version": "4.40.2",
  "use_cache": true,
  "vocab_size": 128256
}

[INFO|from_pretrained|modeling_utils.py:3426] 2024-05-20 17:01:22,383 > loading weights file /raid/LLM/llama3-8b/model.safetensors.index.json
[INFO|_set_default_torch_dtype|modeling_utils.py:1494] 2024-05-20 17:01:22,384 > Instantiating LlamaForCausalLM model under default dtype torch.float32.
[INFO|from_dict|configuration_utils.py:928] 2024-05-20 17:01:22,384 > Generate config GenerationConfig {
  "bos_token_id": 128000,
  "eos_token_id": 128001
}

[INFO|_load_pretrained_model|modeling_utils.py:4170] 2024-05-20 17:01:24,676 > All model checkpoint weights were used when initializing LlamaForCausalLM.

[INFO|_load_pretrained_model|modeling_utils.py:4178] 2024-05-20 17:01:24,676 > All the weights of LlamaForCausalLM were initialized from the model checkpoint at /raid/LLM/llama3-8b.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|from_pretrained|configuration_utils.py:881] 2024-05-20 17:01:24,679 > loading configuration file /raid/LLM/llama3-8b/generation_config.json
[INFO|from_dict|configuration_utils.py:928] 2024-05-20 17:01:24,680 > Generate config GenerationConfig {
  "bos_token_id": 128000,
  "eos_token_id": 128001
}

[INFO|<module>|prep.py:357] 2024-05-20 17:01:24,681 > Overwrite Q weight
[INFO|<module>|prep.py:359] 2024-05-20 17:01:28,423 > Save Model
[INFO|save_pretrained|configuration_utils.py:471] 2024-05-20 17:01:28,424 > Configuration saved in /raid/LLM/llama3-8b_w2-quip-r64_c4-320_a-w-g_8-1e-4-unfinetuned-20240520_162342/config.json
[INFO|save_pretrained|configuration_utils.py:697] 2024-05-20 17:01:28,425 > Configuration saved in /raid/LLM/llama3-8b_w2-quip-r64_c4-320_a-w-g_8-1e-4-unfinetuned-20240520_162342/generation_config.json
[INFO|save_pretrained|modeling_utils.py:2598] 2024-05-20 17:02:00,528 > The model is bigger than the maximum size per checkpoint (5GB) and is going to be split in 7 checkpoint shards. You can find where each parameters has been saved in the index located at /raid/LLM/llama3-8b_w2-quip-r64_c4-320_a-w-g_8-1e-4-unfinetuned-20240520_162342/model.safetensors.index.json.