[INFO|parser.py:355] 2024-09-02 15:49:30,713 >> Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: False, compute dtype: torch.float16 [INFO|tokenization_utils_base.py:2269] 2024-09-02 15:49:32,191 >> loading file qwen.tiktoken from cache at C:\Users\22320\.cache\huggingface\hub\models--Qwen--Qwen-1_8B\snapshots\fa6e214ccbbc6a55235c26ef406355b6bfdf5eed\qwen.tiktoken [INFO|tokenization_utils_base.py:2269] 2024-09-02 15:49:32,192 >> loading file added_tokens.json from cache at None [INFO|tokenization_utils_base.py:2269] 2024-09-02 15:49:32,192 >> loading file special_tokens_map.json from cache at None [INFO|tokenization_utils_base.py:2269] 2024-09-02 15:49:32,192 >> loading file tokenizer_config.json from cache at C:\Users\22320\.cache\huggingface\hub\models--Qwen--Qwen-1_8B\snapshots\fa6e214ccbbc6a55235c26ef406355b6bfdf5eed\tokenizer_config.json [INFO|tokenization_utils_base.py:2269] 2024-09-02 15:49:32,192 >> loading file tokenizer.json from cache at None [INFO|template.py:270] 2024-09-02 15:49:32,688 >> Add eos token: <|endoftext|> [INFO|template.py:375] 2024-09-02 15:49:32,689 >> Add pad token: <|endoftext|> [INFO|loader.py:52] 2024-09-02 15:49:32,690 >> Loading dataset llamafactory/glaive_toolcall_en... [INFO|configuration_utils.py:733] 2024-09-02 15:49:40,666 >> loading configuration file config.json from cache at C:\Users\22320\.cache\huggingface\hub\models--Qwen--Qwen-1_8B\snapshots\fa6e214ccbbc6a55235c26ef406355b6bfdf5eed\config.json [INFO|configuration_utils.py:733] 2024-09-02 15:49:41,313 >> loading configuration file config.json from cache at C:\Users\22320\.cache\huggingface\hub\models--Qwen--Qwen-1_8B\snapshots\fa6e214ccbbc6a55235c26ef406355b6bfdf5eed\config.json [INFO|configuration_utils.py:800] 2024-09-02 15:49:41,314 >> Model config QWenConfig { "_name_or_path": "Qwen/Qwen-1_8B", "architectures": [ "QWenLMHeadModel" ], "attn_dropout_prob": 0.0, "auto_map": { "AutoConfig": "Qwen/Qwen-1_8B--configuration_qwen.QWenConfig", "AutoModelForCausalLM": "Qwen/Qwen-1_8B--modeling_qwen.QWenLMHeadModel" }, "bf16": false, "emb_dropout_prob": 0.0, "fp16": false, "fp32": false, "hidden_size": 2048, "initializer_range": 0.02, "intermediate_size": 11008, "kv_channels": 128, "layer_norm_epsilon": 1e-06, "max_position_embeddings": 8192, "model_type": "qwen", "no_bias": true, "num_attention_heads": 16, "num_hidden_layers": 24, "onnx_safe": null, "rotary_emb_base": 10000, "rotary_pct": 1.0, "scale_attn_weights": true, "seq_length": 8192, "softmax_in_fp32": false, "tie_word_embeddings": false, "tokenizer_class": "QWenTokenizer", "transformers_version": "4.44.2", "use_cache": true, "use_cache_kernel": false, "use_cache_quantization": false, "use_dynamic_ntk": true, "use_flash_attn": "auto", "use_logn_attn": true, "vocab_size": 151936 } [INFO|modeling_utils.py:3678] 2024-09-02 15:49:41,731 >> loading weights file model.safetensors from cache at C:\Users\22320\.cache\huggingface\hub\models--Qwen--Qwen-1_8B\snapshots\fa6e214ccbbc6a55235c26ef406355b6bfdf5eed\model.safetensors.index.json [INFO|modeling_utils.py:1606] 2024-09-02 16:16:57,834 >> Instantiating QWenLMHeadModel model under default dtype torch.float16. [INFO|configuration_utils.py:1038] 2024-09-02 16:16:57,838 >> Generate config GenerationConfig {} [INFO|modeling_utils.py:4507] 2024-09-02 16:17:02,439 >> All model checkpoint weights were used when initializing QWenLMHeadModel. [INFO|modeling_utils.py:4515] 2024-09-02 16:17:02,440 >> All the weights of QWenLMHeadModel were initialized from the model checkpoint at Qwen/Qwen-1_8B. If your task is similar to the task the model of the checkpoint was trained on, you can already use QWenLMHeadModel for predictions without further training. [INFO|modeling_utils.py:4003] 2024-09-02 16:17:12,457 >> Generation config file not found, using a generation config created from the model config. [WARNING|checkpointing.py:70] 2024-09-02 16:17:12,489 >> You are using the old GC format, some features (e.g. BAdam) will be invalid. [INFO|checkpointing.py:103] 2024-09-02 16:17:12,491 >> Gradient checkpointing enabled. [INFO|attention.py:86] 2024-09-02 16:17:12,493 >> Using vanilla attention implementation. [INFO|adapter.py:302] 2024-09-02 16:17:12,493 >> Upcasting trainable params to float32. [INFO|adapter.py:158] 2024-09-02 16:17:12,494 >> Fine-tuning method: LoRA [INFO|misc.py:56] 2024-09-02 16:17:12,497 >> Found linear modules: c_attn,w2,c_proj,w1 [INFO|loader.py:196] 2024-09-02 16:17:13,257 >> trainable params: 6,709,248 || all params: 1,843,537,920 || trainable%: 0.3639 [INFO|trainer.py:648] 2024-09-02 16:17:13,289 >> Using auto half precision backend [INFO|trainer.py:2134] 2024-09-02 16:17:13,776 >> ***** Running training ***** [INFO|trainer.py:2135] 2024-09-02 16:17:13,777 >> Num examples = 900 [INFO|trainer.py:2136] 2024-09-02 16:17:13,778 >> Num Epochs = 3 [INFO|trainer.py:2137] 2024-09-02 16:17:13,779 >> Instantaneous batch size per device = 2 [INFO|trainer.py:2140] 2024-09-02 16:17:13,780 >> Total train batch size (w. parallel, distributed & accumulation) = 16 [INFO|trainer.py:2141] 2024-09-02 16:17:13,781 >> Gradient Accumulation steps = 8 [INFO|trainer.py:2142] 2024-09-02 16:17:13,782 >> Total optimization steps = 168 [INFO|trainer.py:2143] 2024-09-02 16:17:13,788 >> Number of trainable parameters = 6,709,248 [INFO|callbacks.py:319] 2024-09-02 16:17:47,749 >> {'loss': 0.9542, 'learning_rate': 4.9891e-05, 'epoch': 0.09, 'throughput': 1389.23} [INFO|callbacks.py:319] 2024-09-02 16:18:21,440 >> {'loss': 0.7834, 'learning_rate': 4.9564e-05, 'epoch': 0.18, 'throughput': 1487.79} [INFO|callbacks.py:319] 2024-09-02 16:18:54,465 >> {'loss': 0.7296, 'learning_rate': 4.9023e-05, 'epoch': 0.27, 'throughput': 1490.51} [INFO|callbacks.py:319] 2024-09-02 16:19:25,676 >> {'loss': 0.6653, 'learning_rate': 4.8272e-05, 'epoch': 0.36, 'throughput': 1493.24} [INFO|callbacks.py:319] 2024-09-02 16:19:55,399 >> {'loss': 0.6830, 'learning_rate': 4.7317e-05, 'epoch': 0.44, 'throughput': 1513.54} [INFO|callbacks.py:319] 2024-09-02 16:20:25,205 >> {'loss': 0.5651, 'learning_rate': 4.6168e-05, 'epoch': 0.53, 'throughput': 1533.3} [INFO|callbacks.py:319] 2024-09-02 16:20:56,227 >> {'loss': 0.5411, 'learning_rate': 4.4834e-05, 'epoch': 0.62, 'throughput': 1520.29} [INFO|callbacks.py:319] 2024-09-02 16:21:25,069 >> {'loss': 0.5720, 'learning_rate': 4.3326e-05, 'epoch': 0.71, 'throughput': 1523.32} [INFO|callbacks.py:319] 2024-09-02 16:21:55,779 >> {'loss': 0.6106, 'learning_rate': 4.1659e-05, 'epoch': 0.80, 'throughput': 1528.21} [INFO|callbacks.py:319] 2024-09-02 16:22:26,052 >> {'loss': 0.5552, 'learning_rate': 3.9846e-05, 'epoch': 0.89, 'throughput': 1529.26} [INFO|callbacks.py:319] 2024-09-02 16:22:58,326 >> {'loss': 0.5659, 'learning_rate': 3.7903e-05, 'epoch': 0.98, 'throughput': 1528.57} [INFO|callbacks.py:319] 2024-09-02 16:23:31,779 >> {'loss': 0.5800, 'learning_rate': 3.5847e-05, 'epoch': 1.07, 'throughput': 1527.01} [INFO|callbacks.py:319] 2024-09-02 16:24:02,611 >> {'loss': 0.5429, 'learning_rate': 3.3697e-05, 'epoch': 1.16, 'throughput': 1529.65} [INFO|callbacks.py:319] 2024-09-02 16:24:31,189 >> {'loss': 0.4384, 'learning_rate': 3.1470e-05, 'epoch': 1.24, 'throughput': 1532.32} [INFO|callbacks.py:319] 2024-09-02 16:25:02,456 >> {'loss': 0.5749, 'learning_rate': 2.9188e-05, 'epoch': 1.33, 'throughput': 1525.37} [INFO|callbacks.py:319] 2024-09-02 16:25:30,813 >> {'loss': 0.4222, 'learning_rate': 2.6868e-05, 'epoch': 1.42, 'throughput': 1514.93} [INFO|callbacks.py:319] 2024-09-02 16:26:01,816 >> {'loss': 0.5560, 'learning_rate': 2.4533e-05, 'epoch': 1.51, 'throughput': 1519.0} [INFO|callbacks.py:319] 2024-09-02 16:26:33,656 >> {'loss': 0.4963, 'learning_rate': 2.2201e-05, 'epoch': 1.60, 'throughput': 1522.01} [INFO|callbacks.py:319] 2024-09-02 16:27:08,164 >> {'loss': 0.5085, 'learning_rate': 1.9894e-05, 'epoch': 1.69, 'throughput': 1522.94} [INFO|callbacks.py:319] 2024-09-02 16:27:37,551 >> {'loss': 0.4726, 'learning_rate': 1.7631e-05, 'epoch': 1.78, 'throughput': 1522.11} [INFO|trainer.py:3819] 2024-09-02 16:27:37,560 >> ***** Running Evaluation ***** [INFO|trainer.py:3821] 2024-09-02 16:27:37,560 >> Num examples = 100 [INFO|trainer.py:3824] 2024-09-02 16:27:37,561 >> Batch size = 2 [INFO|trainer.py:3503] 2024-09-02 16:27:48,996 >> Saving model checkpoint to saves\Qwen-1.8B\lora\train_2024-09-02-15-46-54\checkpoint-100 [INFO|configuration_utils.py:733] 2024-09-02 16:27:51,471 >> loading configuration file config.json from cache at C:\Users\22320\.cache\huggingface\hub\models--Qwen--Qwen-1_8B\snapshots\fa6e214ccbbc6a55235c26ef406355b6bfdf5eed\config.json [INFO|configuration_utils.py:800] 2024-09-02 16:27:51,473 >> Model config QWenConfig { "architectures": [ "QWenLMHeadModel" ], "attn_dropout_prob": 0.0, "auto_map": { "AutoConfig": "Qwen/Qwen-1_8B--configuration_qwen.QWenConfig", "AutoModelForCausalLM": "Qwen/Qwen-1_8B--modeling_qwen.QWenLMHeadModel" }, "bf16": false, "emb_dropout_prob": 0.0, "fp16": false, "fp32": false, "hidden_size": 2048, "initializer_range": 0.02, "intermediate_size": 11008, "kv_channels": 128, "layer_norm_epsilon": 1e-06, "max_position_embeddings": 8192, "model_type": "qwen", "no_bias": true, "num_attention_heads": 16, "num_hidden_layers": 24, "onnx_safe": null, "rotary_emb_base": 10000, "rotary_pct": 1.0, "scale_attn_weights": true, "seq_length": 8192, "softmax_in_fp32": false, "tie_word_embeddings": false, "tokenizer_class": "QWenTokenizer", "transformers_version": "4.44.2", "use_cache": true, "use_cache_kernel": false, "use_cache_quantization": false, "use_dynamic_ntk": true, "use_flash_attn": "auto", "use_logn_attn": true, "vocab_size": 151936 } [INFO|tokenization_utils_base.py:2684] 2024-09-02 16:27:51,605 >> tokenizer config file saved in saves\Qwen-1.8B\lora\train_2024-09-02-15-46-54\checkpoint-100\tokenizer_config.json [INFO|tokenization_utils_base.py:2693] 2024-09-02 16:27:51,607 >> Special tokens file saved in saves\Qwen-1.8B\lora\train_2024-09-02-15-46-54\checkpoint-100\special_tokens_map.json [INFO|callbacks.py:319] 2024-09-02 16:28:22,125 >> {'loss': 0.4874, 'learning_rate': 1.5433e-05, 'epoch': 1.87, 'throughput': 1490.45} [INFO|callbacks.py:319] 2024-09-02 16:28:55,988 >> {'loss': 0.5235, 'learning_rate': 1.3318e-05, 'epoch': 1.96, 'throughput': 1491.79} [INFO|callbacks.py:319] 2024-09-02 16:29:28,437 >> {'loss': 0.4984, 'learning_rate': 1.1306e-05, 'epoch': 2.04, 'throughput': 1493.02} [INFO|callbacks.py:319] 2024-09-02 16:30:01,857 >> {'loss': 0.4777, 'learning_rate': 9.4128e-06, 'epoch': 2.13, 'throughput': 1498.71} [INFO|callbacks.py:319] 2024-09-02 16:30:30,832 >> {'loss': 0.4918, 'learning_rate': 7.6560e-06, 'epoch': 2.22, 'throughput': 1498.79} [INFO|callbacks.py:319] 2024-09-02 16:31:00,464 >> {'loss': 0.4573, 'learning_rate': 6.0507e-06, 'epoch': 2.31, 'throughput': 1494.08} [INFO|callbacks.py:319] 2024-09-02 16:31:30,565 >> {'loss': 0.4549, 'learning_rate': 4.6110e-06, 'epoch': 2.40, 'throughput': 1494.64} [INFO|callbacks.py:319] 2024-09-02 16:32:01,843 >> {'loss': 0.5422, 'learning_rate': 3.3494e-06, 'epoch': 2.49, 'throughput': 1497.43} [INFO|callbacks.py:319] 2024-09-02 16:32:34,313 >> {'loss': 0.4968, 'learning_rate': 2.2769e-06, 'epoch': 2.58, 'throughput': 1500.22} [INFO|callbacks.py:319] 2024-09-02 16:33:04,696 >> {'loss': 0.4361, 'learning_rate': 1.4029e-06, 'epoch': 2.67, 'throughput': 1497.23} [INFO|callbacks.py:319] 2024-09-02 16:33:34,901 >> {'loss': 0.4517, 'learning_rate': 7.3509e-07, 'epoch': 2.76, 'throughput': 1498.31} [INFO|callbacks.py:319] 2024-09-02 16:34:05,001 >> {'loss': 0.4099, 'learning_rate': 2.7923e-07, 'epoch': 2.84, 'throughput': 1498.38} [INFO|callbacks.py:319] 2024-09-02 16:34:34,884 >> {'loss': 0.4861, 'learning_rate': 3.9330e-08, 'epoch': 2.93, 'throughput': 1504.61} [INFO|trainer.py:3503] 2024-09-02 16:34:53,706 >> Saving model checkpoint to saves\Qwen-1.8B\lora\train_2024-09-02-15-46-54\checkpoint-168 [INFO|configuration_utils.py:733] 2024-09-02 16:34:56,148 >> loading configuration file config.json from cache at C:\Users\22320\.cache\huggingface\hub\models--Qwen--Qwen-1_8B\snapshots\fa6e214ccbbc6a55235c26ef406355b6bfdf5eed\config.json [INFO|configuration_utils.py:800] 2024-09-02 16:34:56,150 >> Model config QWenConfig { "architectures": [ "QWenLMHeadModel" ], "attn_dropout_prob": 0.0, "auto_map": { "AutoConfig": "Qwen/Qwen-1_8B--configuration_qwen.QWenConfig", "AutoModelForCausalLM": "Qwen/Qwen-1_8B--modeling_qwen.QWenLMHeadModel" }, "bf16": false, "emb_dropout_prob": 0.0, "fp16": false, "fp32": false, "hidden_size": 2048, "initializer_range": 0.02, "intermediate_size": 11008, "kv_channels": 128, "layer_norm_epsilon": 1e-06, "max_position_embeddings": 8192, "model_type": "qwen", "no_bias": true, "num_attention_heads": 16, "num_hidden_layers": 24, "onnx_safe": null, "rotary_emb_base": 10000, "rotary_pct": 1.0, "scale_attn_weights": true, "seq_length": 8192, "softmax_in_fp32": false, "tie_word_embeddings": false, "tokenizer_class": "QWenTokenizer", "transformers_version": "4.44.2", "use_cache": true, "use_cache_kernel": false, "use_cache_quantization": false, "use_dynamic_ntk": true, "use_flash_attn": "auto", "use_logn_attn": true, "vocab_size": 151936 } [INFO|tokenization_utils_base.py:2684] 2024-09-02 16:34:56,239 >> tokenizer config file saved in saves\Qwen-1.8B\lora\train_2024-09-02-15-46-54\checkpoint-168\tokenizer_config.json [INFO|tokenization_utils_base.py:2693] 2024-09-02 16:34:56,240 >> Special tokens file saved in saves\Qwen-1.8B\lora\train_2024-09-02-15-46-54\checkpoint-168\special_tokens_map.json [INFO|trainer.py:2394] 2024-09-02 16:34:56,859 >> Training completed. Do not forget to share your model on huggingface.co/models =) [INFO|trainer.py:3503] 2024-09-02 16:34:56,864 >> Saving model checkpoint to saves\Qwen-1.8B\lora\train_2024-09-02-15-46-54 [INFO|configuration_utils.py:733] 2024-09-02 16:34:58,089 >> loading configuration file config.json from cache at C:\Users\22320\.cache\huggingface\hub\models--Qwen--Qwen-1_8B\snapshots\fa6e214ccbbc6a55235c26ef406355b6bfdf5eed\config.json [INFO|configuration_utils.py:800] 2024-09-02 16:34:58,092 >> Model config QWenConfig { "architectures": [ "QWenLMHeadModel" ], "attn_dropout_prob": 0.0, "auto_map": { "AutoConfig": "Qwen/Qwen-1_8B--configuration_qwen.QWenConfig", "AutoModelForCausalLM": "Qwen/Qwen-1_8B--modeling_qwen.QWenLMHeadModel" }, "bf16": false, "emb_dropout_prob": 0.0, "fp16": false, "fp32": false, "hidden_size": 2048, "initializer_range": 0.02, "intermediate_size": 11008, "kv_channels": 128, "layer_norm_epsilon": 1e-06, "max_position_embeddings": 8192, "model_type": "qwen", "no_bias": true, "num_attention_heads": 16, "num_hidden_layers": 24, "onnx_safe": null, "rotary_emb_base": 10000, "rotary_pct": 1.0, "scale_attn_weights": true, "seq_length": 8192, "softmax_in_fp32": false, "tie_word_embeddings": false, "tokenizer_class": "QWenTokenizer", "transformers_version": "4.44.2", "use_cache": true, "use_cache_kernel": false, "use_cache_quantization": false, "use_dynamic_ntk": true, "use_flash_attn": "auto", "use_logn_attn": true, "vocab_size": 151936 } [INFO|tokenization_utils_base.py:2684] 2024-09-02 16:34:58,235 >> tokenizer config file saved in saves\Qwen-1.8B\lora\train_2024-09-02-15-46-54\tokenizer_config.json [INFO|tokenization_utils_base.py:2693] 2024-09-02 16:34:58,238 >> Special tokens file saved in saves\Qwen-1.8B\lora\train_2024-09-02-15-46-54\special_tokens_map.json [WARNING|ploting.py:89] 2024-09-02 16:34:59,062 >> No metric eval_accuracy to plot. [INFO|trainer.py:3819] 2024-09-02 16:34:59,082 >> ***** Running Evaluation ***** [INFO|trainer.py:3821] 2024-09-02 16:34:59,084 >> Num examples = 100 [INFO|trainer.py:3824] 2024-09-02 16:34:59,086 >> Batch size = 2 [INFO|modelcard.py:449] 2024-09-02 16:35:12,284 >> Dropping the following result as it does not have all the necessary fields: {'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}