06/23/2024 20:22:06 - INFO - transformers.tokenization_utils_base - loading file vocab.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/vocab.json 06/23/2024 20:22:06 - INFO - transformers.tokenization_utils_base - loading file merges.txt from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/merges.txt 06/23/2024 20:22:06 - INFO - transformers.tokenization_utils_base - loading file tokenizer.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/tokenizer.json 06/23/2024 20:22:06 - INFO - transformers.tokenization_utils_base - loading file added_tokens.json from cache at None 06/23/2024 20:22:06 - INFO - transformers.tokenization_utils_base - loading file special_tokens_map.json from cache at None 06/23/2024 20:22:06 - INFO - transformers.tokenization_utils_base - loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/tokenizer_config.json 06/23/2024 20:22:06 - WARNING - transformers.tokenization_utils_base - Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 06/23/2024 20:22:06 - INFO - llamafactory.data.template - Replace eos token: <|im_end|> 06/23/2024 20:22:06 - INFO - llamafactory.data.loader - Loading dataset pretraining.json... 06/23/2024 20:22:07 - WARNING - datasets.arrow_dataset - num_proc must be <= 1. Reducing num_proc to 1 for dataset of size 1. 06/23/2024 20:22:07 - WARNING - datasets.arrow_dataset - num_proc must be <= 1. Reducing num_proc to 1 for dataset of size 1. 06/23/2024 20:22:07 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/config.json 06/23/2024 20:22:07 - INFO - transformers.configuration_utils - Model config Qwen2Config { "_name_or_path": "Qwen/Qwen2-7B-Instruct", "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 32768, "max_window_layers": 28, "model_type": "qwen2", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_theta": 1000000.0, "sliding_window": 131072, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.41.2", "use_cache": true, "use_sliding_window": false, "vocab_size": 152064 } 06/23/2024 20:22:07 - INFO - transformers.modeling_utils - loading weights file model.safetensors from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/model.safetensors.index.json 06/23/2024 20:22:07 - INFO - transformers.modeling_utils - Instantiating Qwen2ForCausalLM model under default dtype torch.float16. 06/23/2024 20:22:07 - INFO - transformers.generation.configuration_utils - Generate config GenerationConfig { "bos_token_id": 151643, "eos_token_id": 151645 } 06/23/2024 20:22:19 - INFO - transformers.modeling_utils - All model checkpoint weights were used when initializing Qwen2ForCausalLM. 06/23/2024 20:22:19 - INFO - transformers.modeling_utils - All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at Qwen/Qwen2-7B-Instruct. If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training. 06/23/2024 20:22:19 - INFO - transformers.generation.configuration_utils - loading configuration file generation_config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/generation_config.json 06/23/2024 20:22:19 - INFO - transformers.generation.configuration_utils - Generate config GenerationConfig { "bos_token_id": 151643, "do_sample": true, "eos_token_id": [ 151645, 151643 ], "pad_token_id": 151643, "repetition_penalty": 1.05, "temperature": 0.7, "top_k": 20, "top_p": 0.8 } 06/23/2024 20:22:19 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled. 06/23/2024 20:22:19 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference. 06/23/2024 20:22:19 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. 06/23/2024 20:22:19 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA 06/23/2024 20:22:19 - INFO - llamafactory.model.model_utils.misc - Found linear modules: v_proj,o_proj,down_proj,q_proj,k_proj,gate_proj,up_proj 06/23/2024 20:22:31 - INFO - llamafactory.model.loader - trainable params: 645922816 || all params: 8261539328 || trainable%: 7.8184 06/23/2024 20:22:31 - WARNING - accelerate.utils.other - Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. 06/23/2024 20:22:32 - INFO - transformers.trainer - Using auto half precision backend 06/23/2024 20:22:32 - INFO - transformers.trainer - ***** Running training ***** 06/23/2024 20:22:32 - INFO - transformers.trainer - Num examples = 46 06/23/2024 20:22:32 - INFO - transformers.trainer - Num Epochs = 8 06/23/2024 20:22:32 - INFO - transformers.trainer - Instantaneous batch size per device = 1 06/23/2024 20:22:32 - INFO - transformers.trainer - Total train batch size (w. parallel, distributed & accumulation) = 8 06/23/2024 20:22:32 - INFO - transformers.trainer - Gradient Accumulation steps = 8 06/23/2024 20:22:32 - INFO - transformers.trainer - Total optimization steps = 40 06/23/2024 20:22:32 - INFO - transformers.trainer - Number of trainable parameters = 645,922,816 06/23/2024 20:22:44 - INFO - llamafactory.extras.callbacks - {'loss': 1.4129, 'learning_rate': 1.9969e-04, 'epoch': 0.17, 'throughput': 1290.60} 06/23/2024 20:22:56 - INFO - llamafactory.extras.callbacks - {'loss': 1.3486, 'learning_rate': 1.9877e-04, 'epoch': 0.35, 'throughput': 1327.07} 06/23/2024 20:23:09 - INFO - llamafactory.extras.callbacks - {'loss': 1.3940, 'learning_rate': 1.9724e-04, 'epoch': 0.52, 'throughput': 1338.81} 06/23/2024 20:23:21 - INFO - llamafactory.extras.callbacks - {'loss': 1.3051, 'learning_rate': 1.9511e-04, 'epoch': 0.70, 'throughput': 1343.63} 06/23/2024 20:23:33 - INFO - llamafactory.extras.callbacks - {'loss': 1.3236, 'learning_rate': 1.9239e-04, 'epoch': 0.87, 'throughput': 1332.49} 06/23/2024 20:23:46 - INFO - llamafactory.extras.callbacks - {'loss': 1.4556, 'learning_rate': 1.8910e-04, 'epoch': 1.04, 'throughput': 1321.22} 06/23/2024 20:23:59 - INFO - llamafactory.extras.callbacks - {'loss': 1.3086, 'learning_rate': 1.8526e-04, 'epoch': 1.22, 'throughput': 1318.95} 06/23/2024 20:24:12 - INFO - llamafactory.extras.callbacks - {'loss': 1.1029, 'learning_rate': 1.8090e-04, 'epoch': 1.39, 'throughput': 1313.23} 06/23/2024 20:24:25 - INFO - llamafactory.extras.callbacks - {'loss': 1.2992, 'learning_rate': 1.7604e-04, 'epoch': 1.57, 'throughput': 1307.87} 06/23/2024 20:24:37 - INFO - llamafactory.extras.callbacks - {'loss': 1.1914, 'learning_rate': 1.7071e-04, 'epoch': 1.74, 'throughput': 1306.85} 06/23/2024 20:24:50 - INFO - llamafactory.extras.callbacks - {'loss': 1.2417, 'learning_rate': 1.6494e-04, 'epoch': 1.91, 'throughput': 1303.78} 06/23/2024 20:25:03 - INFO - llamafactory.extras.callbacks - {'loss': 1.3909, 'learning_rate': 1.5878e-04, 'epoch': 2.09, 'throughput': 1301.25} 06/23/2024 20:25:16 - INFO - llamafactory.extras.callbacks - {'loss': 1.1383, 'learning_rate': 1.5225e-04, 'epoch': 2.26, 'throughput': 1300.97} 06/23/2024 20:25:28 - INFO - llamafactory.extras.callbacks - {'loss': 1.2009, 'learning_rate': 1.4540e-04, 'epoch': 2.43, 'throughput': 1298.29} 06/23/2024 20:25:41 - INFO - llamafactory.extras.callbacks - {'loss': 1.1918, 'learning_rate': 1.3827e-04, 'epoch': 2.61, 'throughput': 1296.23} 06/23/2024 20:25:54 - INFO - llamafactory.extras.callbacks - {'loss': 1.0642, 'learning_rate': 1.3090e-04, 'epoch': 2.78, 'throughput': 1296.29} 06/23/2024 20:26:07 - INFO - llamafactory.extras.callbacks - {'loss': 1.2661, 'learning_rate': 1.2334e-04, 'epoch': 2.96, 'throughput': 1294.50} 06/23/2024 20:26:20 - INFO - llamafactory.extras.callbacks - {'loss': 1.0416, 'learning_rate': 1.1564e-04, 'epoch': 3.13, 'throughput': 1293.00} 06/23/2024 20:26:32 - INFO - llamafactory.extras.callbacks - {'loss': 1.0564, 'learning_rate': 1.0785e-04, 'epoch': 3.30, 'throughput': 1293.37} 06/23/2024 20:26:45 - INFO - llamafactory.extras.callbacks - {'loss': 0.9894, 'learning_rate': 1.0000e-04, 'epoch': 3.48, 'throughput': 1292.09} 06/23/2024 20:26:58 - INFO - llamafactory.extras.callbacks - {'loss': 1.0994, 'learning_rate': 9.2154e-05, 'epoch': 3.65, 'throughput': 1291.15} 06/23/2024 20:27:11 - INFO - llamafactory.extras.callbacks - {'loss': 1.3105, 'learning_rate': 8.4357e-05, 'epoch': 3.83, 'throughput': 1291.31} 06/23/2024 20:27:24 - INFO - llamafactory.extras.callbacks - {'loss': 1.0337, 'learning_rate': 7.6655e-05, 'epoch': 4.00, 'throughput': 1290.40} 06/23/2024 20:27:37 - INFO - llamafactory.extras.callbacks - {'loss': 1.0329, 'learning_rate': 6.9098e-05, 'epoch': 4.17, 'throughput': 1289.52} 06/23/2024 20:27:49 - INFO - llamafactory.extras.callbacks - {'loss': 0.9724, 'learning_rate': 6.1732e-05, 'epoch': 4.35, 'throughput': 1289.76} 06/23/2024 20:28:02 - INFO - llamafactory.extras.callbacks - {'loss': 1.1225, 'learning_rate': 5.4601e-05, 'epoch': 4.52, 'throughput': 1288.92} 06/23/2024 20:28:15 - INFO - llamafactory.extras.callbacks - {'loss': 0.9500, 'learning_rate': 4.7750e-05, 'epoch': 4.70, 'throughput': 1288.18} 06/23/2024 20:28:28 - INFO - llamafactory.extras.callbacks - {'loss': 1.0672, 'learning_rate': 4.1221e-05, 'epoch': 4.87, 'throughput': 1288.47} 06/23/2024 20:28:41 - INFO - llamafactory.extras.callbacks - {'loss': 0.9970, 'learning_rate': 3.5055e-05, 'epoch': 5.04, 'throughput': 1287.83} 06/23/2024 20:28:54 - INFO - llamafactory.extras.callbacks - {'loss': 0.8303, 'learning_rate': 2.9289e-05, 'epoch': 5.22, 'throughput': 1287.20} 06/23/2024 20:29:06 - INFO - llamafactory.extras.callbacks - {'loss': 0.8613, 'learning_rate': 2.3959e-05, 'epoch': 5.39, 'throughput': 1287.49} 06/23/2024 20:29:19 - INFO - llamafactory.extras.callbacks - {'loss': 0.8735, 'learning_rate': 1.9098e-05, 'epoch': 5.57, 'throughput': 1286.95} 06/23/2024 20:29:32 - INFO - llamafactory.extras.callbacks - {'loss': 1.1102, 'learning_rate': 1.4736e-05, 'epoch': 5.74, 'throughput': 1286.34} 06/23/2024 20:29:45 - INFO - llamafactory.extras.callbacks - {'loss': 1.2143, 'learning_rate': 1.0899e-05, 'epoch': 5.91, 'throughput': 1286.56} 06/23/2024 20:29:58 - INFO - llamafactory.extras.callbacks - {'loss': 1.1449, 'learning_rate': 7.6120e-06, 'epoch': 6.09, 'throughput': 1286.09} 06/23/2024 20:30:11 - INFO - llamafactory.extras.callbacks - {'loss': 1.0676, 'learning_rate': 4.8943e-06, 'epoch': 6.26, 'throughput': 1285.62} 06/23/2024 20:30:23 - INFO - llamafactory.extras.callbacks - {'loss': 0.9810, 'learning_rate': 2.7630e-06, 'epoch': 6.43, 'throughput': 1285.96} 06/23/2024 20:30:36 - INFO - llamafactory.extras.callbacks - {'loss': 0.8797, 'learning_rate': 1.2312e-06, 'epoch': 6.61, 'throughput': 1285.52} 06/23/2024 20:30:49 - INFO - llamafactory.extras.callbacks - {'loss': 0.8676, 'learning_rate': 3.0827e-07, 'epoch': 6.78, 'throughput': 1285.12} 06/23/2024 20:31:02 - INFO - llamafactory.extras.callbacks - {'loss': 1.0274, 'learning_rate': 0.0000e+00, 'epoch': 6.96, 'throughput': 1285.35} 06/23/2024 20:31:02 - INFO - transformers.trainer - Training completed. Do not forget to share your model on huggingface.co/models =) 06/23/2024 20:31:02 - INFO - transformers.trainer - Saving model checkpoint to saves/Qwen2-7B-Chat/lora/final-pretrain-v2 06/23/2024 20:31:02 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/config.json 06/23/2024 20:31:02 - INFO - transformers.configuration_utils - Model config Qwen2Config { "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 3584, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 32768, "max_window_layers": 28, "model_type": "qwen2", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_theta": 1000000.0, "sliding_window": 131072, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.41.2", "use_cache": true, "use_sliding_window": false, "vocab_size": 152064 } 06/23/2024 20:31:04 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in saves/Qwen2-7B-Chat/lora/final-pretrain-v2/tokenizer_config.json 06/23/2024 20:31:04 - INFO - transformers.tokenization_utils_base - Special tokens file saved in saves/Qwen2-7B-Chat/lora/final-pretrain-v2/special_tokens_map.json 06/23/2024 20:31:05 - WARNING - llamafactory.extras.ploting - No metric eval_loss to plot. 06/23/2024 20:31:05 - INFO - transformers.modelcard - Dropping the following result as it does not have all the necessary fields: {'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}