05/05/2024 16:36:00 - INFO - transformers.tokenization_utils_base - loading file tokenizer.model from cache at /root/.cache/huggingface/hub/models--huggyllama--llama-7b/snapshots/8416d3fefb0cb3ff5775a7b13c1692d10ff1aa16/tokenizer.model 05/05/2024 16:36:00 - INFO - transformers.tokenization_utils_base - loading file tokenizer.json from cache at /root/.cache/huggingface/hub/models--huggyllama--llama-7b/snapshots/8416d3fefb0cb3ff5775a7b13c1692d10ff1aa16/tokenizer.json 05/05/2024 16:36:00 - INFO - transformers.tokenization_utils_base - loading file added_tokens.json from cache at None 05/05/2024 16:36:00 - INFO - transformers.tokenization_utils_base - loading file special_tokens_map.json from cache at /root/.cache/huggingface/hub/models--huggyllama--llama-7b/snapshots/8416d3fefb0cb3ff5775a7b13c1692d10ff1aa16/special_tokens_map.json 05/05/2024 16:36:00 - INFO - transformers.tokenization_utils_base - loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--huggyllama--llama-7b/snapshots/8416d3fefb0cb3ff5775a7b13c1692d10ff1aa16/tokenizer_config.json 05/05/2024 16:36:00 - INFO - llmtuner.data.template - Add pad token: 05/05/2024 16:36:00 - INFO - llmtuner.data.loader - Loading dataset identity.json... 05/05/2024 16:36:00 - WARNING - llmtuner.data.utils - Checksum failed: mismatched SHA-1 hash value at data/identity.json. 05/05/2024 16:36:03 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--huggyllama--llama-7b/snapshots/8416d3fefb0cb3ff5775a7b13c1692d10ff1aa16/config.json 05/05/2024 16:36:03 - INFO - transformers.configuration_utils - Model config LlamaConfig { "_name_or_path": "huggyllama/llama-7b", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 2048, "max_sequence_length": 2048, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 32, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 10000.0, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.40.1", "use_cache": true, "vocab_size": 32000 } 05/05/2024 16:36:04 - INFO - transformers.modeling_utils - loading weights file model.safetensors from cache at /root/.cache/huggingface/hub/models--huggyllama--llama-7b/snapshots/8416d3fefb0cb3ff5775a7b13c1692d10ff1aa16/model.safetensors.index.json 05/05/2024 16:38:24 - INFO - transformers.modeling_utils - Instantiating LlamaForCausalLM model under default dtype torch.float16. 05/05/2024 16:38:24 - INFO - transformers.generation.configuration_utils - Generate config GenerationConfig { "bos_token_id": 1, "eos_token_id": 2, "pad_token_id": 0 } 05/05/2024 16:39:24 - INFO - transformers.modeling_utils - All model checkpoint weights were used when initializing LlamaForCausalLM. 05/05/2024 16:39:24 - INFO - transformers.modeling_utils - All the weights of LlamaForCausalLM were initialized from the model checkpoint at huggyllama/llama-7b. If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training. 05/05/2024 16:39:24 - INFO - transformers.generation.configuration_utils - loading configuration file generation_config.json from cache at /root/.cache/huggingface/hub/models--huggyllama--llama-7b/snapshots/8416d3fefb0cb3ff5775a7b13c1692d10ff1aa16/generation_config.json 05/05/2024 16:39:24 - INFO - transformers.generation.configuration_utils - Generate config GenerationConfig { "bos_token_id": 1, "eos_token_id": 2, "pad_token_id": 0 } 05/05/2024 16:39:24 - INFO - llmtuner.model.utils.checkpointing - Gradient checkpointing enabled. 05/05/2024 16:39:24 - INFO - llmtuner.model.utils.attention - Using torch SDPA for faster training and inference. 05/05/2024 16:39:24 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA 05/05/2024 16:39:24 - INFO - llmtuner.model.loader - trainable params: 4194304 || all params: 6742609920 || trainable%: 0.0622 05/05/2024 16:39:24 - INFO - transformers.trainer - Using auto half precision backend 05/05/2024 16:39:25 - INFO - transformers.trainer - ***** Running training ***** 05/05/2024 16:39:25 - INFO - transformers.trainer - Num examples = 6,755 05/05/2024 16:39:25 - INFO - transformers.trainer - Num Epochs = 3 05/05/2024 16:39:25 - INFO - transformers.trainer - Instantaneous batch size per device = 2 05/05/2024 16:39:25 - INFO - transformers.trainer - Total train batch size (w. parallel, distributed & accumulation) = 16 05/05/2024 16:39:25 - INFO - transformers.trainer - Gradient Accumulation steps = 8 05/05/2024 16:39:25 - INFO - transformers.trainer - Total optimization steps = 1,266 05/05/2024 16:39:25 - INFO - transformers.trainer - Number of trainable parameters = 4,194,304 05/05/2024 16:39:43 - INFO - llmtuner.extras.callbacks - {'loss': 3.8760, 'learning_rate': 4.9998e-05, 'epoch': 0.01} 05/05/2024 16:40:01 - INFO - llmtuner.extras.callbacks - {'loss': 3.8538, 'learning_rate': 4.9992e-05, 'epoch': 0.02} 05/05/2024 16:40:18 - INFO - llmtuner.extras.callbacks - {'loss': 3.5742, 'learning_rate': 4.9983e-05, 'epoch': 0.04} 05/05/2024 16:40:36 - INFO - llmtuner.extras.callbacks - {'loss': 3.5193, 'learning_rate': 4.9969e-05, 'epoch': 0.05} 05/05/2024 16:40:52 - INFO - llmtuner.extras.callbacks - {'loss': 2.9465, 'learning_rate': 4.9952e-05, 'epoch': 0.06} 05/05/2024 16:41:09 - INFO - llmtuner.extras.callbacks - {'loss': 3.0208, 'learning_rate': 4.9931e-05, 'epoch': 0.07} 05/05/2024 16:41:27 - INFO - llmtuner.extras.callbacks - {'loss': 2.7173, 'learning_rate': 4.9906e-05, 'epoch': 0.08} 05/05/2024 16:41:45 - INFO - llmtuner.extras.callbacks - {'loss': 2.6235, 'learning_rate': 4.9877e-05, 'epoch': 0.09} 05/05/2024 16:42:05 - INFO - llmtuner.extras.callbacks - {'loss': 2.4021, 'learning_rate': 4.9844e-05, 'epoch': 0.11} 05/05/2024 16:42:24 - INFO - llmtuner.extras.callbacks - {'loss': 2.1688, 'learning_rate': 4.9808e-05, 'epoch': 0.12} 05/05/2024 16:42:44 - INFO - llmtuner.extras.callbacks - {'loss': 2.2943, 'learning_rate': 4.9768e-05, 'epoch': 0.13} 05/05/2024 16:43:03 - INFO - llmtuner.extras.callbacks - {'loss': 1.9571, 'learning_rate': 4.9723e-05, 'epoch': 0.14} 05/05/2024 16:43:21 - INFO - llmtuner.extras.callbacks - {'loss': 2.1024, 'learning_rate': 4.9675e-05, 'epoch': 0.15} 05/05/2024 16:43:41 - INFO - llmtuner.extras.callbacks - {'loss': 1.9145, 'learning_rate': 4.9624e-05, 'epoch': 0.17} 05/05/2024 16:44:01 - INFO - llmtuner.extras.callbacks - {'loss': 2.1472, 'learning_rate': 4.9568e-05, 'epoch': 0.18} 05/05/2024 16:44:21 - INFO - llmtuner.extras.callbacks - {'loss': 2.0441, 'learning_rate': 4.9509e-05, 'epoch': 0.19} 05/05/2024 16:44:39 - INFO - llmtuner.extras.callbacks - {'loss': 2.1605, 'learning_rate': 4.9446e-05, 'epoch': 0.20} 05/05/2024 16:44:57 - INFO - llmtuner.extras.callbacks - {'loss': 2.2801, 'learning_rate': 4.9379e-05, 'epoch': 0.21} 05/05/2024 16:45:16 - INFO - llmtuner.extras.callbacks - {'loss': 2.2327, 'learning_rate': 4.9309e-05, 'epoch': 0.22} 05/05/2024 16:45:36 - INFO - llmtuner.extras.callbacks - {'loss': 2.0031, 'learning_rate': 4.9234e-05, 'epoch': 0.24} 05/05/2024 16:45:36 - INFO - transformers.trainer - Saving model checkpoint to saves/LLaMA-7B/lora/custom1/checkpoint-100 05/05/2024 16:45:36 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--huggyllama--llama-7b/snapshots/8416d3fefb0cb3ff5775a7b13c1692d10ff1aa16/config.json 05/05/2024 16:45:36 - INFO - transformers.configuration_utils - Model config LlamaConfig { "_name_or_path": "/home/sgugger/tmp/llama/llama-7b/", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 2048, "max_sequence_length": 2048, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 32, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 10000.0, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.40.1", "use_cache": true, "vocab_size": 32000 } 05/05/2024 16:45:36 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in saves/LLaMA-7B/lora/custom1/checkpoint-100/tokenizer_config.json 05/05/2024 16:45:36 - INFO - transformers.tokenization_utils_base - Special tokens file saved in saves/LLaMA-7B/lora/custom1/checkpoint-100/special_tokens_map.json 05/05/2024 16:45:55 - INFO - llmtuner.extras.callbacks - {'loss': 1.8637, 'learning_rate': 4.9156e-05, 'epoch': 0.25} 05/05/2024 16:46:15 - INFO - llmtuner.extras.callbacks - {'loss': 1.9813, 'learning_rate': 4.9074e-05, 'epoch': 0.26} 05/05/2024 16:46:34 - INFO - llmtuner.extras.callbacks - {'loss': 2.1698, 'learning_rate': 4.8989e-05, 'epoch': 0.27} 05/05/2024 16:46:52 - INFO - llmtuner.extras.callbacks - {'loss': 2.1691, 'learning_rate': 4.8900e-05, 'epoch': 0.28} 05/05/2024 16:47:15 - INFO - llmtuner.extras.callbacks - {'loss': 2.1437, 'learning_rate': 4.8807e-05, 'epoch': 0.30} 05/05/2024 16:47:34 - INFO - llmtuner.extras.callbacks - {'loss': 2.0780, 'learning_rate': 4.8710e-05, 'epoch': 0.31} 05/05/2024 16:47:56 - INFO - llmtuner.extras.callbacks - {'loss': 2.0338, 'learning_rate': 4.8610e-05, 'epoch': 0.32} 05/05/2024 16:48:16 - INFO - llmtuner.extras.callbacks - {'loss': 2.1387, 'learning_rate': 4.8506e-05, 'epoch': 0.33} 05/05/2024 16:48:35 - INFO - llmtuner.extras.callbacks - {'loss': 2.0853, 'learning_rate': 4.8399e-05, 'epoch': 0.34} 05/05/2024 16:48:55 - INFO - llmtuner.extras.callbacks - {'loss': 1.9919, 'learning_rate': 4.8288e-05, 'epoch': 0.36} 05/05/2024 16:49:17 - INFO - llmtuner.extras.callbacks - {'loss': 2.1078, 'learning_rate': 4.8173e-05, 'epoch': 0.37} 05/05/2024 16:49:38 - INFO - llmtuner.extras.callbacks - {'loss': 1.9385, 'learning_rate': 4.8055e-05, 'epoch': 0.38} 05/05/2024 16:49:58 - INFO - llmtuner.extras.callbacks - {'loss': 1.8268, 'learning_rate': 4.7934e-05, 'epoch': 0.39} 05/05/2024 16:50:18 - INFO - llmtuner.extras.callbacks - {'loss': 1.8595, 'learning_rate': 4.7808e-05, 'epoch': 0.40} 05/05/2024 16:50:38 - INFO - llmtuner.extras.callbacks - {'loss': 2.0050, 'learning_rate': 4.7679e-05, 'epoch': 0.41} 05/05/2024 16:50:57 - INFO - llmtuner.extras.callbacks - {'loss': 2.0088, 'learning_rate': 4.7547e-05, 'epoch': 0.43} 05/05/2024 16:51:17 - INFO - llmtuner.extras.callbacks - {'loss': 2.1591, 'learning_rate': 4.7412e-05, 'epoch': 0.44} 05/05/2024 16:51:37 - INFO - llmtuner.extras.callbacks - {'loss': 1.9772, 'learning_rate': 4.7272e-05, 'epoch': 0.45} 05/05/2024 16:51:56 - INFO - llmtuner.extras.callbacks - {'loss': 1.9476, 'learning_rate': 4.7130e-05, 'epoch': 0.46} 05/05/2024 16:52:16 - INFO - llmtuner.extras.callbacks - {'loss': 2.0495, 'learning_rate': 4.6984e-05, 'epoch': 0.47} 05/05/2024 16:52:16 - INFO - transformers.trainer - Saving model checkpoint to saves/LLaMA-7B/lora/custom1/checkpoint-200 05/05/2024 16:52:16 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--huggyllama--llama-7b/snapshots/8416d3fefb0cb3ff5775a7b13c1692d10ff1aa16/config.json 05/05/2024 16:52:16 - INFO - transformers.configuration_utils - Model config LlamaConfig { "_name_or_path": "/home/sgugger/tmp/llama/llama-7b/", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 2048, "max_sequence_length": 2048, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 32, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 10000.0, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.40.1", "use_cache": true, "vocab_size": 32000 } 05/05/2024 16:52:16 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in saves/LLaMA-7B/lora/custom1/checkpoint-200/tokenizer_config.json 05/05/2024 16:52:16 - INFO - transformers.tokenization_utils_base - Special tokens file saved in saves/LLaMA-7B/lora/custom1/checkpoint-200/special_tokens_map.json 05/05/2024 16:52:36 - INFO - llmtuner.extras.callbacks - {'loss': 2.0267, 'learning_rate': 4.6834e-05, 'epoch': 0.49} 05/05/2024 16:52:55 - INFO - llmtuner.extras.callbacks - {'loss': 1.8998, 'learning_rate': 4.6682e-05, 'epoch': 0.50} 05/05/2024 16:53:14 - INFO - llmtuner.extras.callbacks - {'loss': 1.9095, 'learning_rate': 4.6525e-05, 'epoch': 0.51} 05/05/2024 16:53:34 - INFO - llmtuner.extras.callbacks - {'loss': 1.9350, 'learning_rate': 4.6366e-05, 'epoch': 0.52} 05/05/2024 16:53:53 - INFO - llmtuner.extras.callbacks - {'loss': 2.0840, 'learning_rate': 4.6203e-05, 'epoch': 0.53} 05/05/2024 16:54:13 - INFO - llmtuner.extras.callbacks - {'loss': 1.9387, 'learning_rate': 4.6037e-05, 'epoch': 0.54} 05/05/2024 16:54:34 - INFO - llmtuner.extras.callbacks - {'loss': 1.8710, 'learning_rate': 4.5868e-05, 'epoch': 0.56} 05/05/2024 16:54:54 - INFO - llmtuner.extras.callbacks - {'loss': 1.9629, 'learning_rate': 4.5696e-05, 'epoch': 0.57} 05/05/2024 16:55:13 - INFO - llmtuner.extras.callbacks - {'loss': 2.1182, 'learning_rate': 4.5520e-05, 'epoch': 0.58} 05/05/2024 16:55:31 - INFO - llmtuner.extras.callbacks - {'loss': 1.8874, 'learning_rate': 4.5341e-05, 'epoch': 0.59} 05/05/2024 16:55:49 - INFO - llmtuner.extras.callbacks - {'loss': 1.8281, 'learning_rate': 4.5160e-05, 'epoch': 0.60} 05/05/2024 16:56:07 - INFO - llmtuner.extras.callbacks - {'loss': 2.0387, 'learning_rate': 4.4975e-05, 'epoch': 0.62} 05/05/2024 16:56:27 - INFO - llmtuner.extras.callbacks - {'loss': 2.1307, 'learning_rate': 4.4787e-05, 'epoch': 0.63} 05/05/2024 16:56:47 - INFO - llmtuner.extras.callbacks - {'loss': 2.0755, 'learning_rate': 4.4595e-05, 'epoch': 0.64} 05/05/2024 16:57:07 - INFO - llmtuner.extras.callbacks - {'loss': 1.9149, 'learning_rate': 4.4401e-05, 'epoch': 0.65} 05/05/2024 16:57:30 - INFO - llmtuner.extras.callbacks - {'loss': 1.9840, 'learning_rate': 4.4204e-05, 'epoch': 0.66} 05/05/2024 16:57:48 - INFO - llmtuner.extras.callbacks - {'loss': 1.8697, 'learning_rate': 4.4004e-05, 'epoch': 0.67} 05/05/2024 16:58:10 - INFO - llmtuner.extras.callbacks - {'loss': 1.7951, 'learning_rate': 4.3801e-05, 'epoch': 0.69} 05/05/2024 16:58:29 - INFO - llmtuner.extras.callbacks - {'loss': 1.6276, 'learning_rate': 4.3595e-05, 'epoch': 0.70} 05/05/2024 16:58:50 - INFO - llmtuner.extras.callbacks - {'loss': 1.8438, 'learning_rate': 4.3386e-05, 'epoch': 0.71} 05/05/2024 16:58:50 - INFO - transformers.trainer - Saving model checkpoint to saves/LLaMA-7B/lora/custom1/checkpoint-300 05/05/2024 16:58:50 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--huggyllama--llama-7b/snapshots/8416d3fefb0cb3ff5775a7b13c1692d10ff1aa16/config.json 05/05/2024 16:58:50 - INFO - transformers.configuration_utils - Model config LlamaConfig { "_name_or_path": "/home/sgugger/tmp/llama/llama-7b/", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 2048, "max_sequence_length": 2048, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 32, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 10000.0, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.40.1", "use_cache": true, "vocab_size": 32000 } 05/05/2024 16:58:50 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in saves/LLaMA-7B/lora/custom1/checkpoint-300/tokenizer_config.json 05/05/2024 16:58:50 - INFO - transformers.tokenization_utils_base - Special tokens file saved in saves/LLaMA-7B/lora/custom1/checkpoint-300/special_tokens_map.json 05/05/2024 16:59:10 - INFO - llmtuner.extras.callbacks - {'loss': 1.7839, 'learning_rate': 4.3175e-05, 'epoch': 0.72} 05/05/2024 16:59:31 - INFO - llmtuner.extras.callbacks - {'loss': 1.8604, 'learning_rate': 4.2960e-05, 'epoch': 0.73} 05/05/2024 16:59:49 - INFO - llmtuner.extras.callbacks - {'loss': 1.9514, 'learning_rate': 4.2743e-05, 'epoch': 0.75} 05/05/2024 17:00:08 - INFO - llmtuner.extras.callbacks - {'loss': 1.9439, 'learning_rate': 4.2523e-05, 'epoch': 0.76} 05/05/2024 17:00:28 - INFO - llmtuner.extras.callbacks - {'loss': 2.0266, 'learning_rate': 4.2301e-05, 'epoch': 0.77} 05/05/2024 17:00:47 - INFO - llmtuner.extras.callbacks - {'loss': 1.9851, 'learning_rate': 4.2076e-05, 'epoch': 0.78} 05/05/2024 17:01:06 - INFO - llmtuner.extras.callbacks - {'loss': 1.9760, 'learning_rate': 4.1848e-05, 'epoch': 0.79} 05/05/2024 17:01:26 - INFO - llmtuner.extras.callbacks - {'loss': 1.9039, 'learning_rate': 4.1617e-05, 'epoch': 0.81} 05/05/2024 17:01:45 - INFO - llmtuner.extras.callbacks - {'loss': 1.9577, 'learning_rate': 4.1384e-05, 'epoch': 0.82} 05/05/2024 17:02:03 - INFO - llmtuner.extras.callbacks - {'loss': 1.9182, 'learning_rate': 4.1149e-05, 'epoch': 0.83} 05/05/2024 17:02:23 - INFO - llmtuner.extras.callbacks - {'loss': 1.9574, 'learning_rate': 4.0911e-05, 'epoch': 0.84} 05/05/2024 17:02:41 - INFO - llmtuner.extras.callbacks - {'loss': 1.9129, 'learning_rate': 4.0670e-05, 'epoch': 0.85} 05/05/2024 17:03:04 - INFO - llmtuner.extras.callbacks - {'loss': 2.0108, 'learning_rate': 4.0427e-05, 'epoch': 0.86} 05/05/2024 17:03:23 - INFO - llmtuner.extras.callbacks - {'loss': 2.1509, 'learning_rate': 4.0182e-05, 'epoch': 0.88} 05/05/2024 17:03:43 - INFO - llmtuner.extras.callbacks - {'loss': 2.0100, 'learning_rate': 3.9934e-05, 'epoch': 0.89} 05/05/2024 17:04:02 - INFO - llmtuner.extras.callbacks - {'loss': 2.0312, 'learning_rate': 3.9685e-05, 'epoch': 0.90} 05/05/2024 17:04:21 - INFO - llmtuner.extras.callbacks - {'loss': 1.9705, 'learning_rate': 3.9432e-05, 'epoch': 0.91} 05/05/2024 17:04:41 - INFO - llmtuner.extras.callbacks - {'loss': 2.1293, 'learning_rate': 3.9178e-05, 'epoch': 0.92} 05/05/2024 17:05:01 - INFO - llmtuner.extras.callbacks - {'loss': 2.1897, 'learning_rate': 3.8921e-05, 'epoch': 0.94} 05/05/2024 17:05:20 - INFO - llmtuner.extras.callbacks - {'loss': 1.9660, 'learning_rate': 3.8663e-05, 'epoch': 0.95} 05/05/2024 17:05:20 - INFO - transformers.trainer - Saving model checkpoint to saves/LLaMA-7B/lora/custom1/checkpoint-400 05/05/2024 17:05:20 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--huggyllama--llama-7b/snapshots/8416d3fefb0cb3ff5775a7b13c1692d10ff1aa16/config.json 05/05/2024 17:05:20 - INFO - transformers.configuration_utils - Model config LlamaConfig { "_name_or_path": "/home/sgugger/tmp/llama/llama-7b/", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 2048, "max_sequence_length": 2048, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 32, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 10000.0, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.40.1", "use_cache": true, "vocab_size": 32000 } 05/05/2024 17:05:21 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in saves/LLaMA-7B/lora/custom1/checkpoint-400/tokenizer_config.json 05/05/2024 17:05:21 - INFO - transformers.tokenization_utils_base - Special tokens file saved in saves/LLaMA-7B/lora/custom1/checkpoint-400/special_tokens_map.json 05/05/2024 17:05:39 - INFO - llmtuner.extras.callbacks - {'loss': 1.8980, 'learning_rate': 3.8402e-05, 'epoch': 0.96} 05/05/2024 17:05:59 - INFO - llmtuner.extras.callbacks - {'loss': 2.1288, 'learning_rate': 3.8139e-05, 'epoch': 0.97} 05/05/2024 17:06:18 - INFO - llmtuner.extras.callbacks - {'loss': 1.8449, 'learning_rate': 3.7874e-05, 'epoch': 0.98} 05/05/2024 17:06:37 - INFO - llmtuner.extras.callbacks - {'loss': 1.8785, 'learning_rate': 3.7607e-05, 'epoch': 0.99} 05/05/2024 17:06:56 - INFO - llmtuner.extras.callbacks - {'loss': 1.8010, 'learning_rate': 3.7338e-05, 'epoch': 1.01} 05/05/2024 17:07:14 - INFO - llmtuner.extras.callbacks - {'loss': 1.9475, 'learning_rate': 3.7068e-05, 'epoch': 1.02} 05/05/2024 17:07:34 - INFO - llmtuner.extras.callbacks - {'loss': 1.9996, 'learning_rate': 3.6795e-05, 'epoch': 1.03} 05/05/2024 17:07:56 - INFO - llmtuner.extras.callbacks - {'loss': 1.9598, 'learning_rate': 3.6521e-05, 'epoch': 1.04} 05/05/2024 17:08:17 - INFO - llmtuner.extras.callbacks - {'loss': 1.9598, 'learning_rate': 3.6245e-05, 'epoch': 1.05} 05/05/2024 17:08:38 - INFO - llmtuner.extras.callbacks - {'loss': 1.9089, 'learning_rate': 3.5967e-05, 'epoch': 1.07} 05/05/2024 17:08:59 - INFO - llmtuner.extras.callbacks - {'loss': 1.9496, 'learning_rate': 3.5687e-05, 'epoch': 1.08} 05/05/2024 17:09:17 - INFO - llmtuner.extras.callbacks - {'loss': 1.9467, 'learning_rate': 3.5406e-05, 'epoch': 1.09} 05/05/2024 17:09:37 - INFO - llmtuner.extras.callbacks - {'loss': 2.0235, 'learning_rate': 3.5123e-05, 'epoch': 1.10} 05/05/2024 17:09:56 - INFO - llmtuner.extras.callbacks - {'loss': 1.8844, 'learning_rate': 3.4839e-05, 'epoch': 1.11} 05/05/2024 17:10:16 - INFO - llmtuner.extras.callbacks - {'loss': 1.8287, 'learning_rate': 3.4553e-05, 'epoch': 1.12} 05/05/2024 17:10:35 - INFO - llmtuner.extras.callbacks - {'loss': 2.0754, 'learning_rate': 3.4265e-05, 'epoch': 1.14} 05/05/2024 17:10:54 - INFO - llmtuner.extras.callbacks - {'loss': 1.9269, 'learning_rate': 3.3977e-05, 'epoch': 1.15} 05/05/2024 17:11:13 - INFO - llmtuner.extras.callbacks - {'loss': 2.0613, 'learning_rate': 3.3686e-05, 'epoch': 1.16} 05/05/2024 17:11:33 - INFO - llmtuner.extras.callbacks - {'loss': 1.8047, 'learning_rate': 3.3395e-05, 'epoch': 1.17} 05/05/2024 17:11:55 - INFO - llmtuner.extras.callbacks - {'loss': 1.8639, 'learning_rate': 3.3102e-05, 'epoch': 1.18} 05/05/2024 17:11:55 - INFO - transformers.trainer - Saving model checkpoint to saves/LLaMA-7B/lora/custom1/checkpoint-500 05/05/2024 17:11:55 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--huggyllama--llama-7b/snapshots/8416d3fefb0cb3ff5775a7b13c1692d10ff1aa16/config.json 05/05/2024 17:11:55 - INFO - transformers.configuration_utils - Model config LlamaConfig { "_name_or_path": "/home/sgugger/tmp/llama/llama-7b/", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 2048, "max_sequence_length": 2048, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 32, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 10000.0, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.40.1", "use_cache": true, "vocab_size": 32000 } 05/05/2024 17:11:55 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in saves/LLaMA-7B/lora/custom1/checkpoint-500/tokenizer_config.json 05/05/2024 17:11:55 - INFO - transformers.tokenization_utils_base - Special tokens file saved in saves/LLaMA-7B/lora/custom1/checkpoint-500/special_tokens_map.json 05/05/2024 17:12:15 - INFO - llmtuner.extras.callbacks - {'loss': 1.9545, 'learning_rate': 3.2808e-05, 'epoch': 1.20} 05/05/2024 17:12:34 - INFO - llmtuner.extras.callbacks - {'loss': 1.8740, 'learning_rate': 3.2513e-05, 'epoch': 1.21} 05/05/2024 17:12:54 - INFO - llmtuner.extras.callbacks - {'loss': 1.9178, 'learning_rate': 3.2216e-05, 'epoch': 1.22} 05/05/2024 17:13:13 - INFO - llmtuner.extras.callbacks - {'loss': 1.9190, 'learning_rate': 3.1919e-05, 'epoch': 1.23} 05/05/2024 17:13:32 - INFO - llmtuner.extras.callbacks - {'loss': 1.9128, 'learning_rate': 3.1620e-05, 'epoch': 1.24} 05/05/2024 17:13:50 - INFO - llmtuner.extras.callbacks - {'loss': 2.0132, 'learning_rate': 3.1321e-05, 'epoch': 1.26} 05/05/2024 17:14:10 - INFO - llmtuner.extras.callbacks - {'loss': 1.9014, 'learning_rate': 3.1020e-05, 'epoch': 1.27} 05/05/2024 17:14:30 - INFO - llmtuner.extras.callbacks - {'loss': 1.8482, 'learning_rate': 3.0718e-05, 'epoch': 1.28} 05/05/2024 17:14:49 - INFO - llmtuner.extras.callbacks - {'loss': 1.7475, 'learning_rate': 3.0416e-05, 'epoch': 1.29} 05/05/2024 17:15:08 - INFO - llmtuner.extras.callbacks - {'loss': 2.0185, 'learning_rate': 3.0113e-05, 'epoch': 1.30} 05/05/2024 17:15:27 - INFO - llmtuner.extras.callbacks - {'loss': 1.8825, 'learning_rate': 2.9809e-05, 'epoch': 1.31} 05/05/2024 17:15:49 - INFO - llmtuner.extras.callbacks - {'loss': 1.8319, 'learning_rate': 2.9504e-05, 'epoch': 1.33} 05/05/2024 17:16:08 - INFO - llmtuner.extras.callbacks - {'loss': 1.9613, 'learning_rate': 2.9199e-05, 'epoch': 1.34} 05/05/2024 17:16:29 - INFO - llmtuner.extras.callbacks - {'loss': 1.9367, 'learning_rate': 2.8892e-05, 'epoch': 1.35} 05/05/2024 17:16:48 - INFO - llmtuner.extras.callbacks - {'loss': 1.8950, 'learning_rate': 2.8586e-05, 'epoch': 1.36} 05/05/2024 17:17:09 - INFO - llmtuner.extras.callbacks - {'loss': 2.0023, 'learning_rate': 2.8279e-05, 'epoch': 1.37} 05/05/2024 17:17:27 - INFO - llmtuner.extras.callbacks - {'loss': 1.9469, 'learning_rate': 2.7971e-05, 'epoch': 1.39} 05/05/2024 17:17:46 - INFO - llmtuner.extras.callbacks - {'loss': 1.7915, 'learning_rate': 2.7663e-05, 'epoch': 1.40} 05/05/2024 17:18:06 - INFO - llmtuner.extras.callbacks - {'loss': 1.8253, 'learning_rate': 2.7354e-05, 'epoch': 1.41} 05/05/2024 17:18:26 - INFO - llmtuner.extras.callbacks - {'loss': 2.0741, 'learning_rate': 2.7045e-05, 'epoch': 1.42} 05/05/2024 17:18:26 - INFO - transformers.trainer - Saving model checkpoint to saves/LLaMA-7B/lora/custom1/checkpoint-600 05/05/2024 17:18:26 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--huggyllama--llama-7b/snapshots/8416d3fefb0cb3ff5775a7b13c1692d10ff1aa16/config.json 05/05/2024 17:18:26 - INFO - transformers.configuration_utils - Model config LlamaConfig { "_name_or_path": "/home/sgugger/tmp/llama/llama-7b/", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 2048, "max_sequence_length": 2048, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 32, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 10000.0, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.40.1", "use_cache": true, "vocab_size": 32000 } 05/05/2024 17:18:26 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in saves/LLaMA-7B/lora/custom1/checkpoint-600/tokenizer_config.json 05/05/2024 17:18:26 - INFO - transformers.tokenization_utils_base - Special tokens file saved in saves/LLaMA-7B/lora/custom1/checkpoint-600/special_tokens_map.json 05/05/2024 17:18:45 - INFO - llmtuner.extras.callbacks - {'loss': 1.9010, 'learning_rate': 2.6736e-05, 'epoch': 1.43} 05/05/2024 17:19:06 - INFO - llmtuner.extras.callbacks - {'loss': 2.0099, 'learning_rate': 2.6426e-05, 'epoch': 1.44} 05/05/2024 17:19:25 - INFO - llmtuner.extras.callbacks - {'loss': 1.8853, 'learning_rate': 2.6116e-05, 'epoch': 1.46} 05/05/2024 17:19:43 - INFO - llmtuner.extras.callbacks - {'loss': 1.8205, 'learning_rate': 2.5806e-05, 'epoch': 1.47} 05/05/2024 17:20:03 - INFO - llmtuner.extras.callbacks - {'loss': 2.0851, 'learning_rate': 2.5496e-05, 'epoch': 1.48} 05/05/2024 17:20:24 - INFO - llmtuner.extras.callbacks - {'loss': 2.0913, 'learning_rate': 2.5186e-05, 'epoch': 1.49} 05/05/2024 17:20:43 - INFO - llmtuner.extras.callbacks - {'loss': 1.9521, 'learning_rate': 2.4876e-05, 'epoch': 1.50} 05/05/2024 17:21:04 - INFO - llmtuner.extras.callbacks - {'loss': 1.8525, 'learning_rate': 2.4566e-05, 'epoch': 1.52} 05/05/2024 17:21:25 - INFO - llmtuner.extras.callbacks - {'loss': 1.8034, 'learning_rate': 2.4256e-05, 'epoch': 1.53} 05/05/2024 17:21:44 - INFO - llmtuner.extras.callbacks - {'loss': 2.0530, 'learning_rate': 2.3946e-05, 'epoch': 1.54} 05/05/2024 17:22:03 - INFO - llmtuner.extras.callbacks - {'loss': 1.7877, 'learning_rate': 2.3636e-05, 'epoch': 1.55} 05/05/2024 17:22:22 - INFO - llmtuner.extras.callbacks - {'loss': 1.9865, 'learning_rate': 2.3326e-05, 'epoch': 1.56} 05/05/2024 17:22:41 - INFO - llmtuner.extras.callbacks - {'loss': 1.8799, 'learning_rate': 2.3017e-05, 'epoch': 1.57} 05/05/2024 17:23:00 - INFO - llmtuner.extras.callbacks - {'loss': 1.9329, 'learning_rate': 2.2708e-05, 'epoch': 1.59} 05/05/2024 17:23:18 - INFO - llmtuner.extras.callbacks - {'loss': 1.7828, 'learning_rate': 2.2399e-05, 'epoch': 1.60} 05/05/2024 17:23:37 - INFO - llmtuner.extras.callbacks - {'loss': 2.0114, 'learning_rate': 2.2091e-05, 'epoch': 1.61} 05/05/2024 17:23:56 - INFO - llmtuner.extras.callbacks - {'loss': 1.8671, 'learning_rate': 2.1783e-05, 'epoch': 1.62} 05/05/2024 17:24:14 - INFO - llmtuner.extras.callbacks - {'loss': 1.9453, 'learning_rate': 2.1476e-05, 'epoch': 1.63} 05/05/2024 17:24:36 - INFO - llmtuner.extras.callbacks - {'loss': 1.7151, 'learning_rate': 2.1169e-05, 'epoch': 1.65} 05/05/2024 17:24:54 - INFO - llmtuner.extras.callbacks - {'loss': 1.9591, 'learning_rate': 2.0863e-05, 'epoch': 1.66} 05/05/2024 17:24:54 - INFO - transformers.trainer - Saving model checkpoint to saves/LLaMA-7B/lora/custom1/checkpoint-700 05/05/2024 17:24:54 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--huggyllama--llama-7b/snapshots/8416d3fefb0cb3ff5775a7b13c1692d10ff1aa16/config.json 05/05/2024 17:24:54 - INFO - transformers.configuration_utils - Model config LlamaConfig { "_name_or_path": "/home/sgugger/tmp/llama/llama-7b/", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 2048, "max_sequence_length": 2048, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 32, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 10000.0, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.40.1", "use_cache": true, "vocab_size": 32000 } 05/05/2024 17:24:55 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in saves/LLaMA-7B/lora/custom1/checkpoint-700/tokenizer_config.json 05/05/2024 17:24:55 - INFO - transformers.tokenization_utils_base - Special tokens file saved in saves/LLaMA-7B/lora/custom1/checkpoint-700/special_tokens_map.json 05/05/2024 17:25:14 - INFO - llmtuner.extras.callbacks - {'loss': 1.8999, 'learning_rate': 2.0557e-05, 'epoch': 1.67} 05/05/2024 17:25:33 - INFO - llmtuner.extras.callbacks - {'loss': 2.0037, 'learning_rate': 2.0252e-05, 'epoch': 1.68} 05/05/2024 17:25:54 - INFO - llmtuner.extras.callbacks - {'loss': 1.7028, 'learning_rate': 1.9948e-05, 'epoch': 1.69} 05/05/2024 17:26:11 - INFO - llmtuner.extras.callbacks - {'loss': 2.1283, 'learning_rate': 1.9645e-05, 'epoch': 1.71} 05/05/2024 17:26:30 - INFO - llmtuner.extras.callbacks - {'loss': 2.0017, 'learning_rate': 1.9342e-05, 'epoch': 1.72} 05/05/2024 17:26:50 - INFO - llmtuner.extras.callbacks - {'loss': 1.7329, 'learning_rate': 1.9040e-05, 'epoch': 1.73} 05/05/2024 17:27:09 - INFO - llmtuner.extras.callbacks - {'loss': 1.9965, 'learning_rate': 1.8739e-05, 'epoch': 1.74} 05/05/2024 17:27:30 - INFO - llmtuner.extras.callbacks - {'loss': 1.9321, 'learning_rate': 1.8440e-05, 'epoch': 1.75} 05/05/2024 17:27:48 - INFO - llmtuner.extras.callbacks - {'loss': 1.9739, 'learning_rate': 1.8141e-05, 'epoch': 1.76} 05/05/2024 17:28:06 - INFO - llmtuner.extras.callbacks - {'loss': 1.8864, 'learning_rate': 1.7843e-05, 'epoch': 1.78} 05/05/2024 17:28:26 - INFO - llmtuner.extras.callbacks - {'loss': 1.9227, 'learning_rate': 1.7546e-05, 'epoch': 1.79} 05/05/2024 17:28:46 - INFO - llmtuner.extras.callbacks - {'loss': 2.0025, 'learning_rate': 1.7251e-05, 'epoch': 1.80} 05/05/2024 17:29:05 - INFO - llmtuner.extras.callbacks - {'loss': 1.8039, 'learning_rate': 1.6957e-05, 'epoch': 1.81} 05/05/2024 17:29:23 - INFO - llmtuner.extras.callbacks - {'loss': 1.8008, 'learning_rate': 1.6664e-05, 'epoch': 1.82} 05/05/2024 17:29:41 - INFO - llmtuner.extras.callbacks - {'loss': 1.9609, 'learning_rate': 1.6372e-05, 'epoch': 1.84} 05/05/2024 17:30:00 - INFO - llmtuner.extras.callbacks - {'loss': 2.0843, 'learning_rate': 1.6081e-05, 'epoch': 1.85} 05/05/2024 17:30:23 - INFO - llmtuner.extras.callbacks - {'loss': 1.8281, 'learning_rate': 1.5792e-05, 'epoch': 1.86} 05/05/2024 17:30:43 - INFO - llmtuner.extras.callbacks - {'loss': 1.9628, 'learning_rate': 1.5505e-05, 'epoch': 1.87} 05/05/2024 17:31:02 - INFO - llmtuner.extras.callbacks - {'loss': 1.9867, 'learning_rate': 1.5218e-05, 'epoch': 1.88} 05/05/2024 17:31:22 - INFO - llmtuner.extras.callbacks - {'loss': 1.8819, 'learning_rate': 1.4934e-05, 'epoch': 1.89} 05/05/2024 17:31:22 - INFO - transformers.trainer - Saving model checkpoint to saves/LLaMA-7B/lora/custom1/checkpoint-800 05/05/2024 17:31:22 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--huggyllama--llama-7b/snapshots/8416d3fefb0cb3ff5775a7b13c1692d10ff1aa16/config.json 05/05/2024 17:31:22 - INFO - transformers.configuration_utils - Model config LlamaConfig { "_name_or_path": "/home/sgugger/tmp/llama/llama-7b/", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 2048, "max_sequence_length": 2048, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 32, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 10000.0, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.40.1", "use_cache": true, "vocab_size": 32000 } 05/05/2024 17:31:22 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in saves/LLaMA-7B/lora/custom1/checkpoint-800/tokenizer_config.json 05/05/2024 17:31:22 - INFO - transformers.tokenization_utils_base - Special tokens file saved in saves/LLaMA-7B/lora/custom1/checkpoint-800/special_tokens_map.json 05/05/2024 17:31:43 - INFO - llmtuner.extras.callbacks - {'loss': 1.8938, 'learning_rate': 1.4651e-05, 'epoch': 1.91} 05/05/2024 17:32:02 - INFO - llmtuner.extras.callbacks - {'loss': 1.6936, 'learning_rate': 1.4369e-05, 'epoch': 1.92} 05/05/2024 17:32:20 - INFO - llmtuner.extras.callbacks - {'loss': 2.1922, 'learning_rate': 1.4089e-05, 'epoch': 1.93} 05/05/2024 17:32:41 - INFO - llmtuner.extras.callbacks - {'loss': 1.8547, 'learning_rate': 1.3811e-05, 'epoch': 1.94} 05/05/2024 17:33:01 - INFO - llmtuner.extras.callbacks - {'loss': 1.9437, 'learning_rate': 1.3534e-05, 'epoch': 1.95} 05/05/2024 17:33:24 - INFO - llmtuner.extras.callbacks - {'loss': 1.7016, 'learning_rate': 1.3260e-05, 'epoch': 1.97} 05/05/2024 17:33:44 - INFO - llmtuner.extras.callbacks - {'loss': 1.8801, 'learning_rate': 1.2987e-05, 'epoch': 1.98} 05/05/2024 17:34:02 - INFO - llmtuner.extras.callbacks - {'loss': 1.9408, 'learning_rate': 1.2716e-05, 'epoch': 1.99} 05/05/2024 17:34:21 - INFO - llmtuner.extras.callbacks - {'loss': 1.6631, 'learning_rate': 1.2446e-05, 'epoch': 2.00} 05/05/2024 17:34:41 - INFO - llmtuner.extras.callbacks - {'loss': 1.8590, 'learning_rate': 1.2179e-05, 'epoch': 2.01} 05/05/2024 17:35:03 - INFO - llmtuner.extras.callbacks - {'loss': 1.9671, 'learning_rate': 1.1914e-05, 'epoch': 2.02} 05/05/2024 17:35:21 - INFO - llmtuner.extras.callbacks - {'loss': 1.8109, 'learning_rate': 1.1650e-05, 'epoch': 2.04} 05/05/2024 17:35:41 - INFO - llmtuner.extras.callbacks - {'loss': 1.8024, 'learning_rate': 1.1389e-05, 'epoch': 2.05} 05/05/2024 17:35:58 - INFO - llmtuner.extras.callbacks - {'loss': 1.8327, 'learning_rate': 1.1130e-05, 'epoch': 2.06} 05/05/2024 17:36:17 - INFO - llmtuner.extras.callbacks - {'loss': 1.9797, 'learning_rate': 1.0873e-05, 'epoch': 2.07} 05/05/2024 17:36:37 - INFO - llmtuner.extras.callbacks - {'loss': 1.8164, 'learning_rate': 1.0618e-05, 'epoch': 2.08} 05/05/2024 17:36:57 - INFO - llmtuner.extras.callbacks - {'loss': 1.7994, 'learning_rate': 1.0366e-05, 'epoch': 2.10} 05/05/2024 17:37:16 - INFO - llmtuner.extras.callbacks - {'loss': 1.9428, 'learning_rate': 1.0115e-05, 'epoch': 2.11} 05/05/2024 17:37:36 - INFO - llmtuner.extras.callbacks - {'loss': 1.9297, 'learning_rate': 9.8672e-06, 'epoch': 2.12} 05/05/2024 17:37:55 - INFO - llmtuner.extras.callbacks - {'loss': 1.8185, 'learning_rate': 9.6215e-06, 'epoch': 2.13} 05/05/2024 17:37:55 - INFO - transformers.trainer - Saving model checkpoint to saves/LLaMA-7B/lora/custom1/checkpoint-900 05/05/2024 17:37:55 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--huggyllama--llama-7b/snapshots/8416d3fefb0cb3ff5775a7b13c1692d10ff1aa16/config.json 05/05/2024 17:37:55 - INFO - transformers.configuration_utils - Model config LlamaConfig { "_name_or_path": "/home/sgugger/tmp/llama/llama-7b/", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 2048, "max_sequence_length": 2048, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 32, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 10000.0, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.40.1", "use_cache": true, "vocab_size": 32000 } 05/05/2024 17:37:55 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in saves/LLaMA-7B/lora/custom1/checkpoint-900/tokenizer_config.json 05/05/2024 17:37:55 - INFO - transformers.tokenization_utils_base - Special tokens file saved in saves/LLaMA-7B/lora/custom1/checkpoint-900/special_tokens_map.json 05/05/2024 17:38:16 - INFO - llmtuner.extras.callbacks - {'loss': 1.7902, 'learning_rate': 9.3781e-06, 'epoch': 2.14} 05/05/2024 17:38:37 - INFO - llmtuner.extras.callbacks - {'loss': 1.7111, 'learning_rate': 9.1372e-06, 'epoch': 2.16} 05/05/2024 17:38:58 - INFO - llmtuner.extras.callbacks - {'loss': 2.0400, 'learning_rate': 8.8986e-06, 'epoch': 2.17} 05/05/2024 17:39:16 - INFO - llmtuner.extras.callbacks - {'loss': 1.9046, 'learning_rate': 8.6626e-06, 'epoch': 2.18} 05/05/2024 17:39:34 - INFO - llmtuner.extras.callbacks - {'loss': 2.1426, 'learning_rate': 8.4291e-06, 'epoch': 2.19} 05/05/2024 17:39:55 - INFO - llmtuner.extras.callbacks - {'loss': 2.0743, 'learning_rate': 8.1981e-06, 'epoch': 2.20} 05/05/2024 17:40:15 - INFO - llmtuner.extras.callbacks - {'loss': 1.7135, 'learning_rate': 7.9697e-06, 'epoch': 2.21} 05/05/2024 17:40:34 - INFO - llmtuner.extras.callbacks - {'loss': 1.8054, 'learning_rate': 7.7439e-06, 'epoch': 2.23} 05/05/2024 17:40:52 - INFO - llmtuner.extras.callbacks - {'loss': 1.8244, 'learning_rate': 7.5208e-06, 'epoch': 2.24} 05/05/2024 17:41:14 - INFO - llmtuner.extras.callbacks - {'loss': 1.8097, 'learning_rate': 7.3004e-06, 'epoch': 2.25} 05/05/2024 17:41:32 - INFO - llmtuner.extras.callbacks - {'loss': 1.9919, 'learning_rate': 7.0827e-06, 'epoch': 2.26} 05/05/2024 17:41:51 - INFO - llmtuner.extras.callbacks - {'loss': 1.9084, 'learning_rate': 6.8678e-06, 'epoch': 2.27} 05/05/2024 17:42:11 - INFO - llmtuner.extras.callbacks - {'loss': 1.7835, 'learning_rate': 6.6556e-06, 'epoch': 2.29} 05/05/2024 17:42:30 - INFO - llmtuner.extras.callbacks - {'loss': 1.7542, 'learning_rate': 6.4463e-06, 'epoch': 2.30} 05/05/2024 17:42:49 - INFO - llmtuner.extras.callbacks - {'loss': 2.0351, 'learning_rate': 6.2398e-06, 'epoch': 2.31} 05/05/2024 17:43:10 - INFO - llmtuner.extras.callbacks - {'loss': 1.7908, 'learning_rate': 6.0363e-06, 'epoch': 2.32} 05/05/2024 17:43:29 - INFO - llmtuner.extras.callbacks - {'loss': 1.7511, 'learning_rate': 5.8356e-06, 'epoch': 2.33} 05/05/2024 17:43:48 - INFO - llmtuner.extras.callbacks - {'loss': 1.9621, 'learning_rate': 5.6379e-06, 'epoch': 2.34} 05/05/2024 17:44:07 - INFO - llmtuner.extras.callbacks - {'loss': 2.0578, 'learning_rate': 5.4432e-06, 'epoch': 2.36} 05/05/2024 17:44:29 - INFO - llmtuner.extras.callbacks - {'loss': 1.9198, 'learning_rate': 5.2514e-06, 'epoch': 2.37} 05/05/2024 17:44:29 - INFO - transformers.trainer - Saving model checkpoint to saves/LLaMA-7B/lora/custom1/checkpoint-1000 05/05/2024 17:44:29 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--huggyllama--llama-7b/snapshots/8416d3fefb0cb3ff5775a7b13c1692d10ff1aa16/config.json 05/05/2024 17:44:29 - INFO - transformers.configuration_utils - Model config LlamaConfig { "_name_or_path": "/home/sgugger/tmp/llama/llama-7b/", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 2048, "max_sequence_length": 2048, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 32, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 10000.0, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.40.1", "use_cache": true, "vocab_size": 32000 } 05/05/2024 17:44:29 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in saves/LLaMA-7B/lora/custom1/checkpoint-1000/tokenizer_config.json 05/05/2024 17:44:29 - INFO - transformers.tokenization_utils_base - Special tokens file saved in saves/LLaMA-7B/lora/custom1/checkpoint-1000/special_tokens_map.json 05/05/2024 17:44:48 - INFO - llmtuner.extras.callbacks - {'loss': 1.6089, 'learning_rate': 5.0628e-06, 'epoch': 2.38} 05/05/2024 17:45:08 - INFO - llmtuner.extras.callbacks - {'loss': 1.9499, 'learning_rate': 4.8772e-06, 'epoch': 2.39} 05/05/2024 17:45:27 - INFO - llmtuner.extras.callbacks - {'loss': 2.0044, 'learning_rate': 4.6947e-06, 'epoch': 2.40} 05/05/2024 17:45:47 - INFO - llmtuner.extras.callbacks - {'loss': 2.0896, 'learning_rate': 4.5153e-06, 'epoch': 2.42} 05/05/2024 17:46:06 - INFO - llmtuner.extras.callbacks - {'loss': 1.8803, 'learning_rate': 4.3390e-06, 'epoch': 2.43} 05/05/2024 17:46:24 - INFO - llmtuner.extras.callbacks - {'loss': 2.0795, 'learning_rate': 4.1660e-06, 'epoch': 2.44} 05/05/2024 17:46:43 - INFO - llmtuner.extras.callbacks - {'loss': 1.7221, 'learning_rate': 3.9961e-06, 'epoch': 2.45} 05/05/2024 17:47:02 - INFO - llmtuner.extras.callbacks - {'loss': 1.6881, 'learning_rate': 3.8295e-06, 'epoch': 2.46} 05/05/2024 17:47:22 - INFO - llmtuner.extras.callbacks - {'loss': 1.8039, 'learning_rate': 3.6662e-06, 'epoch': 2.47} 05/05/2024 17:47:40 - INFO - llmtuner.extras.callbacks - {'loss': 1.9333, 'learning_rate': 3.5061e-06, 'epoch': 2.49} 05/05/2024 17:48:00 - INFO - llmtuner.extras.callbacks - {'loss': 1.6731, 'learning_rate': 3.3494e-06, 'epoch': 2.50} 05/05/2024 17:48:19 - INFO - llmtuner.extras.callbacks - {'loss': 1.8437, 'learning_rate': 3.1959e-06, 'epoch': 2.51} 05/05/2024 17:48:38 - INFO - llmtuner.extras.callbacks - {'loss': 1.9577, 'learning_rate': 3.0459e-06, 'epoch': 2.52} 05/05/2024 17:48:59 - INFO - llmtuner.extras.callbacks - {'loss': 2.0162, 'learning_rate': 2.8992e-06, 'epoch': 2.53} 05/05/2024 17:49:19 - INFO - llmtuner.extras.callbacks - {'loss': 1.8091, 'learning_rate': 2.7559e-06, 'epoch': 2.55} 05/05/2024 17:49:40 - INFO - llmtuner.extras.callbacks - {'loss': 1.7485, 'learning_rate': 2.6160e-06, 'epoch': 2.56} 05/05/2024 17:49:59 - INFO - llmtuner.extras.callbacks - {'loss': 1.9556, 'learning_rate': 2.4796e-06, 'epoch': 2.57} 05/05/2024 17:50:19 - INFO - llmtuner.extras.callbacks - {'loss': 1.7762, 'learning_rate': 2.3467e-06, 'epoch': 2.58} 05/05/2024 17:50:38 - INFO - llmtuner.extras.callbacks - {'loss': 1.7463, 'learning_rate': 2.2172e-06, 'epoch': 2.59} 05/05/2024 17:50:56 - INFO - llmtuner.extras.callbacks - {'loss': 1.9825, 'learning_rate': 2.0913e-06, 'epoch': 2.61} 05/05/2024 17:50:56 - INFO - transformers.trainer - Saving model checkpoint to saves/LLaMA-7B/lora/custom1/checkpoint-1100 05/05/2024 17:50:56 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--huggyllama--llama-7b/snapshots/8416d3fefb0cb3ff5775a7b13c1692d10ff1aa16/config.json 05/05/2024 17:50:56 - INFO - transformers.configuration_utils - Model config LlamaConfig { "_name_or_path": "/home/sgugger/tmp/llama/llama-7b/", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 2048, "max_sequence_length": 2048, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 32, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 10000.0, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.40.1", "use_cache": true, "vocab_size": 32000 } 05/05/2024 17:50:56 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in saves/LLaMA-7B/lora/custom1/checkpoint-1100/tokenizer_config.json 05/05/2024 17:50:56 - INFO - transformers.tokenization_utils_base - Special tokens file saved in saves/LLaMA-7B/lora/custom1/checkpoint-1100/special_tokens_map.json 05/05/2024 17:51:15 - INFO - llmtuner.extras.callbacks - {'loss': 1.8317, 'learning_rate': 1.9688e-06, 'epoch': 2.62} 05/05/2024 17:51:33 - INFO - llmtuner.extras.callbacks - {'loss': 1.7377, 'learning_rate': 1.8500e-06, 'epoch': 2.63} 05/05/2024 17:51:53 - INFO - llmtuner.extras.callbacks - {'loss': 1.8165, 'learning_rate': 1.7346e-06, 'epoch': 2.64} 05/05/2024 17:52:11 - INFO - llmtuner.extras.callbacks - {'loss': 1.8233, 'learning_rate': 1.6229e-06, 'epoch': 2.65} 05/05/2024 17:52:29 - INFO - llmtuner.extras.callbacks - {'loss': 1.9029, 'learning_rate': 1.5148e-06, 'epoch': 2.66} 05/05/2024 17:52:50 - INFO - llmtuner.extras.callbacks - {'loss': 1.8527, 'learning_rate': 1.4102e-06, 'epoch': 2.68} 05/05/2024 17:53:08 - INFO - llmtuner.extras.callbacks - {'loss': 1.8855, 'learning_rate': 1.3094e-06, 'epoch': 2.69} 05/05/2024 17:53:27 - INFO - llmtuner.extras.callbacks - {'loss': 1.8746, 'learning_rate': 1.2121e-06, 'epoch': 2.70} 05/05/2024 17:53:46 - INFO - llmtuner.extras.callbacks - {'loss': 1.6960, 'learning_rate': 1.1185e-06, 'epoch': 2.71} 05/05/2024 17:54:07 - INFO - llmtuner.extras.callbacks - {'loss': 1.7904, 'learning_rate': 1.0286e-06, 'epoch': 2.72} 05/05/2024 17:54:27 - INFO - llmtuner.extras.callbacks - {'loss': 1.9620, 'learning_rate': 9.4241e-07, 'epoch': 2.74} 05/05/2024 17:54:44 - INFO - llmtuner.extras.callbacks - {'loss': 1.9262, 'learning_rate': 8.5990e-07, 'epoch': 2.75} 05/05/2024 17:55:04 - INFO - llmtuner.extras.callbacks - {'loss': 1.8789, 'learning_rate': 7.8111e-07, 'epoch': 2.76} 05/05/2024 17:55:22 - INFO - llmtuner.extras.callbacks - {'loss': 1.7546, 'learning_rate': 7.0604e-07, 'epoch': 2.77} 05/05/2024 17:55:43 - INFO - llmtuner.extras.callbacks - {'loss': 1.7979, 'learning_rate': 6.3472e-07, 'epoch': 2.78} 05/05/2024 17:56:02 - INFO - llmtuner.extras.callbacks - {'loss': 1.7874, 'learning_rate': 5.6714e-07, 'epoch': 2.79} 05/05/2024 17:56:21 - INFO - llmtuner.extras.callbacks - {'loss': 1.6197, 'learning_rate': 5.0333e-07, 'epoch': 2.81} 05/05/2024 17:56:39 - INFO - llmtuner.extras.callbacks - {'loss': 1.5225, 'learning_rate': 4.4328e-07, 'epoch': 2.82} 05/05/2024 17:56:57 - INFO - llmtuner.extras.callbacks - {'loss': 1.9811, 'learning_rate': 3.8702e-07, 'epoch': 2.83} 05/05/2024 17:57:16 - INFO - llmtuner.extras.callbacks - {'loss': 1.9999, 'learning_rate': 3.3455e-07, 'epoch': 2.84} 05/05/2024 17:57:16 - INFO - transformers.trainer - Saving model checkpoint to saves/LLaMA-7B/lora/custom1/checkpoint-1200 05/05/2024 17:57:17 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--huggyllama--llama-7b/snapshots/8416d3fefb0cb3ff5775a7b13c1692d10ff1aa16/config.json 05/05/2024 17:57:17 - INFO - transformers.configuration_utils - Model config LlamaConfig { "_name_or_path": "/home/sgugger/tmp/llama/llama-7b/", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 2048, "max_sequence_length": 2048, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 32, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 10000.0, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.40.1", "use_cache": true, "vocab_size": 32000 } 05/05/2024 17:57:17 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in saves/LLaMA-7B/lora/custom1/checkpoint-1200/tokenizer_config.json 05/05/2024 17:57:17 - INFO - transformers.tokenization_utils_base - Special tokens file saved in saves/LLaMA-7B/lora/custom1/checkpoint-1200/special_tokens_map.json 05/05/2024 17:57:36 - INFO - llmtuner.extras.callbacks - {'loss': 2.1399, 'learning_rate': 2.8587e-07, 'epoch': 2.85} 05/05/2024 17:57:57 - INFO - llmtuner.extras.callbacks - {'loss': 1.8664, 'learning_rate': 2.4100e-07, 'epoch': 2.87} 05/05/2024 17:58:16 - INFO - llmtuner.extras.callbacks - {'loss': 1.9633, 'learning_rate': 1.9994e-07, 'epoch': 2.88} 05/05/2024 17:58:36 - INFO - llmtuner.extras.callbacks - {'loss': 2.0106, 'learning_rate': 1.6270e-07, 'epoch': 2.89} 05/05/2024 17:58:55 - INFO - llmtuner.extras.callbacks - {'loss': 1.7896, 'learning_rate': 1.2928e-07, 'epoch': 2.90} 05/05/2024 17:59:16 - INFO - llmtuner.extras.callbacks - {'loss': 1.8759, 'learning_rate': 9.9692e-08, 'epoch': 2.91} 05/05/2024 17:59:35 - INFO - llmtuner.extras.callbacks - {'loss': 1.9413, 'learning_rate': 7.3935e-08, 'epoch': 2.92} 05/05/2024 17:59:55 - INFO - llmtuner.extras.callbacks - {'loss': 1.8404, 'learning_rate': 5.2016e-08, 'epoch': 2.94} 05/05/2024 18:00:15 - INFO - llmtuner.extras.callbacks - {'loss': 2.0654, 'learning_rate': 3.3938e-08, 'epoch': 2.95} 05/05/2024 18:00:36 - INFO - llmtuner.extras.callbacks - {'loss': 1.8111, 'learning_rate': 1.9703e-08, 'epoch': 2.96} 05/05/2024 18:00:55 - INFO - llmtuner.extras.callbacks - {'loss': 1.8708, 'learning_rate': 9.3132e-09, 'epoch': 2.97} 05/05/2024 18:01:15 - INFO - llmtuner.extras.callbacks - {'loss': 1.8842, 'learning_rate': 2.7710e-09, 'epoch': 2.98} 05/05/2024 18:01:34 - INFO - llmtuner.extras.callbacks - {'loss': 1.9665, 'learning_rate': 7.6974e-11, 'epoch': 3.00} 05/05/2024 18:01:37 - INFO - transformers.trainer - Training completed. Do not forget to share your model on huggingface.co/models =) 05/05/2024 18:01:37 - INFO - transformers.trainer - Saving model checkpoint to saves/LLaMA-7B/lora/custom1 05/05/2024 18:01:37 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--huggyllama--llama-7b/snapshots/8416d3fefb0cb3ff5775a7b13c1692d10ff1aa16/config.json 05/05/2024 18:01:37 - INFO - transformers.configuration_utils - Model config LlamaConfig { "_name_or_path": "/home/sgugger/tmp/llama/llama-7b/", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 2048, "max_sequence_length": 2048, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 32, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 10000.0, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.40.1", "use_cache": true, "vocab_size": 32000 } 05/05/2024 18:01:37 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in saves/LLaMA-7B/lora/custom1/tokenizer_config.json 05/05/2024 18:01:37 - INFO - transformers.tokenization_utils_base - Special tokens file saved in saves/LLaMA-7B/lora/custom1/special_tokens_map.json 05/05/2024 18:01:37 - INFO - transformers.modelcard - Dropping the following result as it does not have all the necessary fields: {'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}