06/23/2024 20:22:06 - INFO - transformers.tokenization_utils_base - loading file vocab.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/vocab.json

06/23/2024 20:22:06 - INFO - transformers.tokenization_utils_base - loading file merges.txt from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/merges.txt

06/23/2024 20:22:06 - INFO - transformers.tokenization_utils_base - loading file tokenizer.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/tokenizer.json

06/23/2024 20:22:06 - INFO - transformers.tokenization_utils_base - loading file added_tokens.json from cache at None

06/23/2024 20:22:06 - INFO - transformers.tokenization_utils_base - loading file special_tokens_map.json from cache at None

06/23/2024 20:22:06 - INFO - transformers.tokenization_utils_base - loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/tokenizer_config.json

06/23/2024 20:22:06 - WARNING - transformers.tokenization_utils_base - Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

06/23/2024 20:22:06 - INFO - llamafactory.data.template - Replace eos token: <|im_end|>

06/23/2024 20:22:06 - INFO - llamafactory.data.loader - Loading dataset pretraining.json...

06/23/2024 20:22:07 - WARNING - datasets.arrow_dataset - num_proc must be <= 1. Reducing num_proc to 1 for dataset of size 1.

06/23/2024 20:22:07 - WARNING - datasets.arrow_dataset - num_proc must be <= 1. Reducing num_proc to 1 for dataset of size 1.

06/23/2024 20:22:07 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/config.json

06/23/2024 20:22:07 - INFO - transformers.configuration_utils - Model config Qwen2Config {
  "_name_or_path": "Qwen/Qwen2-7B-Instruct",
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 3584,
  "initializer_range": 0.02,
  "intermediate_size": 18944,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2",
  "num_attention_heads": 28,
  "num_hidden_layers": 28,
  "num_key_value_heads": 4,
  "rms_norm_eps": 1e-06,
  "rope_theta": 1000000.0,
  "sliding_window": 131072,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.41.2",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 152064
}


06/23/2024 20:22:07 - INFO - transformers.modeling_utils - loading weights file model.safetensors from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/model.safetensors.index.json

06/23/2024 20:22:07 - INFO - transformers.modeling_utils - Instantiating Qwen2ForCausalLM model under default dtype torch.float16.

06/23/2024 20:22:07 - INFO - transformers.generation.configuration_utils - Generate config GenerationConfig {
  "bos_token_id": 151643,
  "eos_token_id": 151645
}


06/23/2024 20:22:19 - INFO - transformers.modeling_utils - All model checkpoint weights were used when initializing Qwen2ForCausalLM.


06/23/2024 20:22:19 - INFO - transformers.modeling_utils - All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at Qwen/Qwen2-7B-Instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training.

06/23/2024 20:22:19 - INFO - transformers.generation.configuration_utils - loading configuration file generation_config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/generation_config.json

06/23/2024 20:22:19 - INFO - transformers.generation.configuration_utils - Generate config GenerationConfig {
  "bos_token_id": 151643,
  "do_sample": true,
  "eos_token_id": [
    151645,
    151643
  ],
  "pad_token_id": 151643,
  "repetition_penalty": 1.05,
  "temperature": 0.7,
  "top_k": 20,
  "top_p": 0.8
}


06/23/2024 20:22:19 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.

06/23/2024 20:22:19 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference.

06/23/2024 20:22:19 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.

06/23/2024 20:22:19 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA

06/23/2024 20:22:19 - INFO - llamafactory.model.model_utils.misc - Found linear modules: v_proj,o_proj,down_proj,q_proj,k_proj,gate_proj,up_proj

06/23/2024 20:22:31 - INFO - llamafactory.model.loader - trainable params: 645922816 || all params: 8261539328 || trainable%: 7.8184

06/23/2024 20:22:31 - WARNING - accelerate.utils.other - Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.

06/23/2024 20:22:32 - INFO - transformers.trainer - Using auto half precision backend

06/23/2024 20:22:32 - INFO - transformers.trainer - ***** Running training *****

06/23/2024 20:22:32 - INFO - transformers.trainer -   Num examples = 46

06/23/2024 20:22:32 - INFO - transformers.trainer -   Num Epochs = 8

06/23/2024 20:22:32 - INFO - transformers.trainer -   Instantaneous batch size per device = 1

06/23/2024 20:22:32 - INFO - transformers.trainer -   Total train batch size (w. parallel, distributed & accumulation) = 8

06/23/2024 20:22:32 - INFO - transformers.trainer -   Gradient Accumulation steps = 8

06/23/2024 20:22:32 - INFO - transformers.trainer -   Total optimization steps = 40

06/23/2024 20:22:32 - INFO - transformers.trainer -   Number of trainable parameters = 645,922,816

06/23/2024 20:22:44 - INFO - llamafactory.extras.callbacks - {'loss': 1.4129, 'learning_rate': 1.9969e-04, 'epoch': 0.17, 'throughput': 1290.60}

06/23/2024 20:22:56 - INFO - llamafactory.extras.callbacks - {'loss': 1.3486, 'learning_rate': 1.9877e-04, 'epoch': 0.35, 'throughput': 1327.07}

06/23/2024 20:23:09 - INFO - llamafactory.extras.callbacks - {'loss': 1.3940, 'learning_rate': 1.9724e-04, 'epoch': 0.52, 'throughput': 1338.81}

06/23/2024 20:23:21 - INFO - llamafactory.extras.callbacks - {'loss': 1.3051, 'learning_rate': 1.9511e-04, 'epoch': 0.70, 'throughput': 1343.63}

06/23/2024 20:23:33 - INFO - llamafactory.extras.callbacks - {'loss': 1.3236, 'learning_rate': 1.9239e-04, 'epoch': 0.87, 'throughput': 1332.49}

06/23/2024 20:23:46 - INFO - llamafactory.extras.callbacks - {'loss': 1.4556, 'learning_rate': 1.8910e-04, 'epoch': 1.04, 'throughput': 1321.22}

06/23/2024 20:23:59 - INFO - llamafactory.extras.callbacks - {'loss': 1.3086, 'learning_rate': 1.8526e-04, 'epoch': 1.22, 'throughput': 1318.95}

06/23/2024 20:24:12 - INFO - llamafactory.extras.callbacks - {'loss': 1.1029, 'learning_rate': 1.8090e-04, 'epoch': 1.39, 'throughput': 1313.23}

06/23/2024 20:24:25 - INFO - llamafactory.extras.callbacks - {'loss': 1.2992, 'learning_rate': 1.7604e-04, 'epoch': 1.57, 'throughput': 1307.87}

06/23/2024 20:24:37 - INFO - llamafactory.extras.callbacks - {'loss': 1.1914, 'learning_rate': 1.7071e-04, 'epoch': 1.74, 'throughput': 1306.85}

06/23/2024 20:24:50 - INFO - llamafactory.extras.callbacks - {'loss': 1.2417, 'learning_rate': 1.6494e-04, 'epoch': 1.91, 'throughput': 1303.78}

06/23/2024 20:25:03 - INFO - llamafactory.extras.callbacks - {'loss': 1.3909, 'learning_rate': 1.5878e-04, 'epoch': 2.09, 'throughput': 1301.25}

06/23/2024 20:25:16 - INFO - llamafactory.extras.callbacks - {'loss': 1.1383, 'learning_rate': 1.5225e-04, 'epoch': 2.26, 'throughput': 1300.97}

06/23/2024 20:25:28 - INFO - llamafactory.extras.callbacks - {'loss': 1.2009, 'learning_rate': 1.4540e-04, 'epoch': 2.43, 'throughput': 1298.29}

06/23/2024 20:25:41 - INFO - llamafactory.extras.callbacks - {'loss': 1.1918, 'learning_rate': 1.3827e-04, 'epoch': 2.61, 'throughput': 1296.23}

06/23/2024 20:25:54 - INFO - llamafactory.extras.callbacks - {'loss': 1.0642, 'learning_rate': 1.3090e-04, 'epoch': 2.78, 'throughput': 1296.29}

06/23/2024 20:26:07 - INFO - llamafactory.extras.callbacks - {'loss': 1.2661, 'learning_rate': 1.2334e-04, 'epoch': 2.96, 'throughput': 1294.50}

06/23/2024 20:26:20 - INFO - llamafactory.extras.callbacks - {'loss': 1.0416, 'learning_rate': 1.1564e-04, 'epoch': 3.13, 'throughput': 1293.00}

06/23/2024 20:26:32 - INFO - llamafactory.extras.callbacks - {'loss': 1.0564, 'learning_rate': 1.0785e-04, 'epoch': 3.30, 'throughput': 1293.37}

06/23/2024 20:26:45 - INFO - llamafactory.extras.callbacks - {'loss': 0.9894, 'learning_rate': 1.0000e-04, 'epoch': 3.48, 'throughput': 1292.09}

06/23/2024 20:26:58 - INFO - llamafactory.extras.callbacks - {'loss': 1.0994, 'learning_rate': 9.2154e-05, 'epoch': 3.65, 'throughput': 1291.15}

06/23/2024 20:27:11 - INFO - llamafactory.extras.callbacks - {'loss': 1.3105, 'learning_rate': 8.4357e-05, 'epoch': 3.83, 'throughput': 1291.31}

06/23/2024 20:27:24 - INFO - llamafactory.extras.callbacks - {'loss': 1.0337, 'learning_rate': 7.6655e-05, 'epoch': 4.00, 'throughput': 1290.40}

06/23/2024 20:27:37 - INFO - llamafactory.extras.callbacks - {'loss': 1.0329, 'learning_rate': 6.9098e-05, 'epoch': 4.17, 'throughput': 1289.52}

06/23/2024 20:27:49 - INFO - llamafactory.extras.callbacks - {'loss': 0.9724, 'learning_rate': 6.1732e-05, 'epoch': 4.35, 'throughput': 1289.76}

06/23/2024 20:28:02 - INFO - llamafactory.extras.callbacks - {'loss': 1.1225, 'learning_rate': 5.4601e-05, 'epoch': 4.52, 'throughput': 1288.92}

06/23/2024 20:28:15 - INFO - llamafactory.extras.callbacks - {'loss': 0.9500, 'learning_rate': 4.7750e-05, 'epoch': 4.70, 'throughput': 1288.18}

06/23/2024 20:28:28 - INFO - llamafactory.extras.callbacks - {'loss': 1.0672, 'learning_rate': 4.1221e-05, 'epoch': 4.87, 'throughput': 1288.47}

06/23/2024 20:28:41 - INFO - llamafactory.extras.callbacks - {'loss': 0.9970, 'learning_rate': 3.5055e-05, 'epoch': 5.04, 'throughput': 1287.83}

06/23/2024 20:28:54 - INFO - llamafactory.extras.callbacks - {'loss': 0.8303, 'learning_rate': 2.9289e-05, 'epoch': 5.22, 'throughput': 1287.20}

06/23/2024 20:29:06 - INFO - llamafactory.extras.callbacks - {'loss': 0.8613, 'learning_rate': 2.3959e-05, 'epoch': 5.39, 'throughput': 1287.49}

06/23/2024 20:29:19 - INFO - llamafactory.extras.callbacks - {'loss': 0.8735, 'learning_rate': 1.9098e-05, 'epoch': 5.57, 'throughput': 1286.95}

06/23/2024 20:29:32 - INFO - llamafactory.extras.callbacks - {'loss': 1.1102, 'learning_rate': 1.4736e-05, 'epoch': 5.74, 'throughput': 1286.34}

06/23/2024 20:29:45 - INFO - llamafactory.extras.callbacks - {'loss': 1.2143, 'learning_rate': 1.0899e-05, 'epoch': 5.91, 'throughput': 1286.56}

06/23/2024 20:29:58 - INFO - llamafactory.extras.callbacks - {'loss': 1.1449, 'learning_rate': 7.6120e-06, 'epoch': 6.09, 'throughput': 1286.09}

06/23/2024 20:30:11 - INFO - llamafactory.extras.callbacks - {'loss': 1.0676, 'learning_rate': 4.8943e-06, 'epoch': 6.26, 'throughput': 1285.62}

06/23/2024 20:30:23 - INFO - llamafactory.extras.callbacks - {'loss': 0.9810, 'learning_rate': 2.7630e-06, 'epoch': 6.43, 'throughput': 1285.96}

06/23/2024 20:30:36 - INFO - llamafactory.extras.callbacks - {'loss': 0.8797, 'learning_rate': 1.2312e-06, 'epoch': 6.61, 'throughput': 1285.52}

06/23/2024 20:30:49 - INFO - llamafactory.extras.callbacks - {'loss': 0.8676, 'learning_rate': 3.0827e-07, 'epoch': 6.78, 'throughput': 1285.12}

06/23/2024 20:31:02 - INFO - llamafactory.extras.callbacks - {'loss': 1.0274, 'learning_rate': 0.0000e+00, 'epoch': 6.96, 'throughput': 1285.35}

06/23/2024 20:31:02 - INFO - transformers.trainer - 

Training completed. Do not forget to share your model on huggingface.co/models =)


06/23/2024 20:31:02 - INFO - transformers.trainer - Saving model checkpoint to saves/Qwen2-7B-Chat/lora/final-pretrain-v2

06/23/2024 20:31:02 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen2-7B-Instruct/snapshots/41c66b0be1c3081f13defc6bdf946c2ef240d6a6/config.json

06/23/2024 20:31:02 - INFO - transformers.configuration_utils - Model config Qwen2Config {
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 3584,
  "initializer_range": 0.02,
  "intermediate_size": 18944,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2",
  "num_attention_heads": 28,
  "num_hidden_layers": 28,
  "num_key_value_heads": 4,
  "rms_norm_eps": 1e-06,
  "rope_theta": 1000000.0,
  "sliding_window": 131072,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.41.2",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 152064
}


06/23/2024 20:31:04 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in saves/Qwen2-7B-Chat/lora/final-pretrain-v2/tokenizer_config.json

06/23/2024 20:31:04 - INFO - transformers.tokenization_utils_base - Special tokens file saved in saves/Qwen2-7B-Chat/lora/final-pretrain-v2/special_tokens_map.json

06/23/2024 20:31:05 - WARNING - llamafactory.extras.ploting - No metric eval_loss to plot.

06/23/2024 20:31:05 - INFO - transformers.modelcard - Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}