[INFO|parser.py:355] 2024-09-02 15:49:30,713 >> Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: False, compute dtype: torch.float16

[INFO|tokenization_utils_base.py:2269] 2024-09-02 15:49:32,191 >> loading file qwen.tiktoken from cache at C:\Users\22320\.cache\huggingface\hub\models--Qwen--Qwen-1_8B\snapshots\fa6e214ccbbc6a55235c26ef406355b6bfdf5eed\qwen.tiktoken

[INFO|tokenization_utils_base.py:2269] 2024-09-02 15:49:32,192 >> loading file added_tokens.json from cache at None

[INFO|tokenization_utils_base.py:2269] 2024-09-02 15:49:32,192 >> loading file special_tokens_map.json from cache at None

[INFO|tokenization_utils_base.py:2269] 2024-09-02 15:49:32,192 >> loading file tokenizer_config.json from cache at C:\Users\22320\.cache\huggingface\hub\models--Qwen--Qwen-1_8B\snapshots\fa6e214ccbbc6a55235c26ef406355b6bfdf5eed\tokenizer_config.json

[INFO|tokenization_utils_base.py:2269] 2024-09-02 15:49:32,192 >> loading file tokenizer.json from cache at None

[INFO|template.py:270] 2024-09-02 15:49:32,688 >> Add eos token: <|endoftext|>

[INFO|template.py:375] 2024-09-02 15:49:32,689 >> Add pad token: <|endoftext|>

[INFO|loader.py:52] 2024-09-02 15:49:32,690 >> Loading dataset llamafactory/glaive_toolcall_en...

[INFO|configuration_utils.py:733] 2024-09-02 15:49:40,666 >> loading configuration file config.json from cache at C:\Users\22320\.cache\huggingface\hub\models--Qwen--Qwen-1_8B\snapshots\fa6e214ccbbc6a55235c26ef406355b6bfdf5eed\config.json

[INFO|configuration_utils.py:733] 2024-09-02 15:49:41,313 >> loading configuration file config.json from cache at C:\Users\22320\.cache\huggingface\hub\models--Qwen--Qwen-1_8B\snapshots\fa6e214ccbbc6a55235c26ef406355b6bfdf5eed\config.json

[INFO|configuration_utils.py:800] 2024-09-02 15:49:41,314 >> Model config QWenConfig {
  "_name_or_path": "Qwen/Qwen-1_8B",
  "architectures": [
    "QWenLMHeadModel"
  ],
  "attn_dropout_prob": 0.0,
  "auto_map": {
    "AutoConfig": "Qwen/Qwen-1_8B--configuration_qwen.QWenConfig",
    "AutoModelForCausalLM": "Qwen/Qwen-1_8B--modeling_qwen.QWenLMHeadModel"
  },
  "bf16": false,
  "emb_dropout_prob": 0.0,
  "fp16": false,
  "fp32": false,
  "hidden_size": 2048,
  "initializer_range": 0.02,
  "intermediate_size": 11008,
  "kv_channels": 128,
  "layer_norm_epsilon": 1e-06,
  "max_position_embeddings": 8192,
  "model_type": "qwen",
  "no_bias": true,
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "onnx_safe": null,
  "rotary_emb_base": 10000,
  "rotary_pct": 1.0,
  "scale_attn_weights": true,
  "seq_length": 8192,
  "softmax_in_fp32": false,
  "tie_word_embeddings": false,
  "tokenizer_class": "QWenTokenizer",
  "transformers_version": "4.44.2",
  "use_cache": true,
  "use_cache_kernel": false,
  "use_cache_quantization": false,
  "use_dynamic_ntk": true,
  "use_flash_attn": "auto",
  "use_logn_attn": true,
  "vocab_size": 151936
}


[INFO|modeling_utils.py:3678] 2024-09-02 15:49:41,731 >> loading weights file model.safetensors from cache at C:\Users\22320\.cache\huggingface\hub\models--Qwen--Qwen-1_8B\snapshots\fa6e214ccbbc6a55235c26ef406355b6bfdf5eed\model.safetensors.index.json

[INFO|modeling_utils.py:1606] 2024-09-02 16:16:57,834 >> Instantiating QWenLMHeadModel model under default dtype torch.float16.

[INFO|configuration_utils.py:1038] 2024-09-02 16:16:57,838 >> Generate config GenerationConfig {}


[INFO|modeling_utils.py:4507] 2024-09-02 16:17:02,439 >> All model checkpoint weights were used when initializing QWenLMHeadModel.


[INFO|modeling_utils.py:4515] 2024-09-02 16:17:02,440 >> All the weights of QWenLMHeadModel were initialized from the model checkpoint at Qwen/Qwen-1_8B.
If your task is similar to the task the model of the checkpoint was trained on, you can already use QWenLMHeadModel for predictions without further training.

[INFO|modeling_utils.py:4003] 2024-09-02 16:17:12,457 >> Generation config file not found, using a generation config created from the model config.

[WARNING|checkpointing.py:70] 2024-09-02 16:17:12,489 >> You are using the old GC format, some features (e.g. BAdam) will be invalid.

[INFO|checkpointing.py:103] 2024-09-02 16:17:12,491 >> Gradient checkpointing enabled.

[INFO|attention.py:86] 2024-09-02 16:17:12,493 >> Using vanilla attention implementation.

[INFO|adapter.py:302] 2024-09-02 16:17:12,493 >> Upcasting trainable params to float32.

[INFO|adapter.py:158] 2024-09-02 16:17:12,494 >> Fine-tuning method: LoRA

[INFO|misc.py:56] 2024-09-02 16:17:12,497 >> Found linear modules: c_attn,w2,c_proj,w1

[INFO|loader.py:196] 2024-09-02 16:17:13,257 >> trainable params: 6,709,248 || all params: 1,843,537,920 || trainable%: 0.3639

[INFO|trainer.py:648] 2024-09-02 16:17:13,289 >> Using auto half precision backend

[INFO|trainer.py:2134] 2024-09-02 16:17:13,776 >> ***** Running training *****

[INFO|trainer.py:2135] 2024-09-02 16:17:13,777 >>   Num examples = 900

[INFO|trainer.py:2136] 2024-09-02 16:17:13,778 >>   Num Epochs = 3

[INFO|trainer.py:2137] 2024-09-02 16:17:13,779 >>   Instantaneous batch size per device = 2

[INFO|trainer.py:2140] 2024-09-02 16:17:13,780 >>   Total train batch size (w. parallel, distributed & accumulation) = 16

[INFO|trainer.py:2141] 2024-09-02 16:17:13,781 >>   Gradient Accumulation steps = 8

[INFO|trainer.py:2142] 2024-09-02 16:17:13,782 >>   Total optimization steps = 168

[INFO|trainer.py:2143] 2024-09-02 16:17:13,788 >>   Number of trainable parameters = 6,709,248

[INFO|callbacks.py:319] 2024-09-02 16:17:47,749 >> {'loss': 0.9542, 'learning_rate': 4.9891e-05, 'epoch': 0.09, 'throughput': 1389.23}

[INFO|callbacks.py:319] 2024-09-02 16:18:21,440 >> {'loss': 0.7834, 'learning_rate': 4.9564e-05, 'epoch': 0.18, 'throughput': 1487.79}

[INFO|callbacks.py:319] 2024-09-02 16:18:54,465 >> {'loss': 0.7296, 'learning_rate': 4.9023e-05, 'epoch': 0.27, 'throughput': 1490.51}

[INFO|callbacks.py:319] 2024-09-02 16:19:25,676 >> {'loss': 0.6653, 'learning_rate': 4.8272e-05, 'epoch': 0.36, 'throughput': 1493.24}

[INFO|callbacks.py:319] 2024-09-02 16:19:55,399 >> {'loss': 0.6830, 'learning_rate': 4.7317e-05, 'epoch': 0.44, 'throughput': 1513.54}

[INFO|callbacks.py:319] 2024-09-02 16:20:25,205 >> {'loss': 0.5651, 'learning_rate': 4.6168e-05, 'epoch': 0.53, 'throughput': 1533.3}

[INFO|callbacks.py:319] 2024-09-02 16:20:56,227 >> {'loss': 0.5411, 'learning_rate': 4.4834e-05, 'epoch': 0.62, 'throughput': 1520.29}

[INFO|callbacks.py:319] 2024-09-02 16:21:25,069 >> {'loss': 0.5720, 'learning_rate': 4.3326e-05, 'epoch': 0.71, 'throughput': 1523.32}

[INFO|callbacks.py:319] 2024-09-02 16:21:55,779 >> {'loss': 0.6106, 'learning_rate': 4.1659e-05, 'epoch': 0.80, 'throughput': 1528.21}

[INFO|callbacks.py:319] 2024-09-02 16:22:26,052 >> {'loss': 0.5552, 'learning_rate': 3.9846e-05, 'epoch': 0.89, 'throughput': 1529.26}

[INFO|callbacks.py:319] 2024-09-02 16:22:58,326 >> {'loss': 0.5659, 'learning_rate': 3.7903e-05, 'epoch': 0.98, 'throughput': 1528.57}

[INFO|callbacks.py:319] 2024-09-02 16:23:31,779 >> {'loss': 0.5800, 'learning_rate': 3.5847e-05, 'epoch': 1.07, 'throughput': 1527.01}

[INFO|callbacks.py:319] 2024-09-02 16:24:02,611 >> {'loss': 0.5429, 'learning_rate': 3.3697e-05, 'epoch': 1.16, 'throughput': 1529.65}

[INFO|callbacks.py:319] 2024-09-02 16:24:31,189 >> {'loss': 0.4384, 'learning_rate': 3.1470e-05, 'epoch': 1.24, 'throughput': 1532.32}

[INFO|callbacks.py:319] 2024-09-02 16:25:02,456 >> {'loss': 0.5749, 'learning_rate': 2.9188e-05, 'epoch': 1.33, 'throughput': 1525.37}

[INFO|callbacks.py:319] 2024-09-02 16:25:30,813 >> {'loss': 0.4222, 'learning_rate': 2.6868e-05, 'epoch': 1.42, 'throughput': 1514.93}

[INFO|callbacks.py:319] 2024-09-02 16:26:01,816 >> {'loss': 0.5560, 'learning_rate': 2.4533e-05, 'epoch': 1.51, 'throughput': 1519.0}

[INFO|callbacks.py:319] 2024-09-02 16:26:33,656 >> {'loss': 0.4963, 'learning_rate': 2.2201e-05, 'epoch': 1.60, 'throughput': 1522.01}

[INFO|callbacks.py:319] 2024-09-02 16:27:08,164 >> {'loss': 0.5085, 'learning_rate': 1.9894e-05, 'epoch': 1.69, 'throughput': 1522.94}

[INFO|callbacks.py:319] 2024-09-02 16:27:37,551 >> {'loss': 0.4726, 'learning_rate': 1.7631e-05, 'epoch': 1.78, 'throughput': 1522.11}

[INFO|trainer.py:3819] 2024-09-02 16:27:37,560 >> 
***** Running Evaluation *****

[INFO|trainer.py:3821] 2024-09-02 16:27:37,560 >>   Num examples = 100

[INFO|trainer.py:3824] 2024-09-02 16:27:37,561 >>   Batch size = 2

[INFO|trainer.py:3503] 2024-09-02 16:27:48,996 >> Saving model checkpoint to saves\Qwen-1.8B\lora\train_2024-09-02-15-46-54\checkpoint-100

[INFO|configuration_utils.py:733] 2024-09-02 16:27:51,471 >> loading configuration file config.json from cache at C:\Users\22320\.cache\huggingface\hub\models--Qwen--Qwen-1_8B\snapshots\fa6e214ccbbc6a55235c26ef406355b6bfdf5eed\config.json

[INFO|configuration_utils.py:800] 2024-09-02 16:27:51,473 >> Model config QWenConfig {
  "architectures": [
    "QWenLMHeadModel"
  ],
  "attn_dropout_prob": 0.0,
  "auto_map": {
    "AutoConfig": "Qwen/Qwen-1_8B--configuration_qwen.QWenConfig",
    "AutoModelForCausalLM": "Qwen/Qwen-1_8B--modeling_qwen.QWenLMHeadModel"
  },
  "bf16": false,
  "emb_dropout_prob": 0.0,
  "fp16": false,
  "fp32": false,
  "hidden_size": 2048,
  "initializer_range": 0.02,
  "intermediate_size": 11008,
  "kv_channels": 128,
  "layer_norm_epsilon": 1e-06,
  "max_position_embeddings": 8192,
  "model_type": "qwen",
  "no_bias": true,
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "onnx_safe": null,
  "rotary_emb_base": 10000,
  "rotary_pct": 1.0,
  "scale_attn_weights": true,
  "seq_length": 8192,
  "softmax_in_fp32": false,
  "tie_word_embeddings": false,
  "tokenizer_class": "QWenTokenizer",
  "transformers_version": "4.44.2",
  "use_cache": true,
  "use_cache_kernel": false,
  "use_cache_quantization": false,
  "use_dynamic_ntk": true,
  "use_flash_attn": "auto",
  "use_logn_attn": true,
  "vocab_size": 151936
}


[INFO|tokenization_utils_base.py:2684] 2024-09-02 16:27:51,605 >> tokenizer config file saved in saves\Qwen-1.8B\lora\train_2024-09-02-15-46-54\checkpoint-100\tokenizer_config.json

[INFO|tokenization_utils_base.py:2693] 2024-09-02 16:27:51,607 >> Special tokens file saved in saves\Qwen-1.8B\lora\train_2024-09-02-15-46-54\checkpoint-100\special_tokens_map.json

[INFO|callbacks.py:319] 2024-09-02 16:28:22,125 >> {'loss': 0.4874, 'learning_rate': 1.5433e-05, 'epoch': 1.87, 'throughput': 1490.45}

[INFO|callbacks.py:319] 2024-09-02 16:28:55,988 >> {'loss': 0.5235, 'learning_rate': 1.3318e-05, 'epoch': 1.96, 'throughput': 1491.79}

[INFO|callbacks.py:319] 2024-09-02 16:29:28,437 >> {'loss': 0.4984, 'learning_rate': 1.1306e-05, 'epoch': 2.04, 'throughput': 1493.02}

[INFO|callbacks.py:319] 2024-09-02 16:30:01,857 >> {'loss': 0.4777, 'learning_rate': 9.4128e-06, 'epoch': 2.13, 'throughput': 1498.71}

[INFO|callbacks.py:319] 2024-09-02 16:30:30,832 >> {'loss': 0.4918, 'learning_rate': 7.6560e-06, 'epoch': 2.22, 'throughput': 1498.79}

[INFO|callbacks.py:319] 2024-09-02 16:31:00,464 >> {'loss': 0.4573, 'learning_rate': 6.0507e-06, 'epoch': 2.31, 'throughput': 1494.08}

[INFO|callbacks.py:319] 2024-09-02 16:31:30,565 >> {'loss': 0.4549, 'learning_rate': 4.6110e-06, 'epoch': 2.40, 'throughput': 1494.64}

[INFO|callbacks.py:319] 2024-09-02 16:32:01,843 >> {'loss': 0.5422, 'learning_rate': 3.3494e-06, 'epoch': 2.49, 'throughput': 1497.43}

[INFO|callbacks.py:319] 2024-09-02 16:32:34,313 >> {'loss': 0.4968, 'learning_rate': 2.2769e-06, 'epoch': 2.58, 'throughput': 1500.22}

[INFO|callbacks.py:319] 2024-09-02 16:33:04,696 >> {'loss': 0.4361, 'learning_rate': 1.4029e-06, 'epoch': 2.67, 'throughput': 1497.23}

[INFO|callbacks.py:319] 2024-09-02 16:33:34,901 >> {'loss': 0.4517, 'learning_rate': 7.3509e-07, 'epoch': 2.76, 'throughput': 1498.31}

[INFO|callbacks.py:319] 2024-09-02 16:34:05,001 >> {'loss': 0.4099, 'learning_rate': 2.7923e-07, 'epoch': 2.84, 'throughput': 1498.38}

[INFO|callbacks.py:319] 2024-09-02 16:34:34,884 >> {'loss': 0.4861, 'learning_rate': 3.9330e-08, 'epoch': 2.93, 'throughput': 1504.61}

[INFO|trainer.py:3503] 2024-09-02 16:34:53,706 >> Saving model checkpoint to saves\Qwen-1.8B\lora\train_2024-09-02-15-46-54\checkpoint-168

[INFO|configuration_utils.py:733] 2024-09-02 16:34:56,148 >> loading configuration file config.json from cache at C:\Users\22320\.cache\huggingface\hub\models--Qwen--Qwen-1_8B\snapshots\fa6e214ccbbc6a55235c26ef406355b6bfdf5eed\config.json

[INFO|configuration_utils.py:800] 2024-09-02 16:34:56,150 >> Model config QWenConfig {
  "architectures": [
    "QWenLMHeadModel"
  ],
  "attn_dropout_prob": 0.0,
  "auto_map": {
    "AutoConfig": "Qwen/Qwen-1_8B--configuration_qwen.QWenConfig",
    "AutoModelForCausalLM": "Qwen/Qwen-1_8B--modeling_qwen.QWenLMHeadModel"
  },
  "bf16": false,
  "emb_dropout_prob": 0.0,
  "fp16": false,
  "fp32": false,
  "hidden_size": 2048,
  "initializer_range": 0.02,
  "intermediate_size": 11008,
  "kv_channels": 128,
  "layer_norm_epsilon": 1e-06,
  "max_position_embeddings": 8192,
  "model_type": "qwen",
  "no_bias": true,
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "onnx_safe": null,
  "rotary_emb_base": 10000,
  "rotary_pct": 1.0,
  "scale_attn_weights": true,
  "seq_length": 8192,
  "softmax_in_fp32": false,
  "tie_word_embeddings": false,
  "tokenizer_class": "QWenTokenizer",
  "transformers_version": "4.44.2",
  "use_cache": true,
  "use_cache_kernel": false,
  "use_cache_quantization": false,
  "use_dynamic_ntk": true,
  "use_flash_attn": "auto",
  "use_logn_attn": true,
  "vocab_size": 151936
}


[INFO|tokenization_utils_base.py:2684] 2024-09-02 16:34:56,239 >> tokenizer config file saved in saves\Qwen-1.8B\lora\train_2024-09-02-15-46-54\checkpoint-168\tokenizer_config.json

[INFO|tokenization_utils_base.py:2693] 2024-09-02 16:34:56,240 >> Special tokens file saved in saves\Qwen-1.8B\lora\train_2024-09-02-15-46-54\checkpoint-168\special_tokens_map.json

[INFO|trainer.py:2394] 2024-09-02 16:34:56,859 >> 

Training completed. Do not forget to share your model on huggingface.co/models =)



[INFO|trainer.py:3503] 2024-09-02 16:34:56,864 >> Saving model checkpoint to saves\Qwen-1.8B\lora\train_2024-09-02-15-46-54

[INFO|configuration_utils.py:733] 2024-09-02 16:34:58,089 >> loading configuration file config.json from cache at C:\Users\22320\.cache\huggingface\hub\models--Qwen--Qwen-1_8B\snapshots\fa6e214ccbbc6a55235c26ef406355b6bfdf5eed\config.json

[INFO|configuration_utils.py:800] 2024-09-02 16:34:58,092 >> Model config QWenConfig {
  "architectures": [
    "QWenLMHeadModel"
  ],
  "attn_dropout_prob": 0.0,
  "auto_map": {
    "AutoConfig": "Qwen/Qwen-1_8B--configuration_qwen.QWenConfig",
    "AutoModelForCausalLM": "Qwen/Qwen-1_8B--modeling_qwen.QWenLMHeadModel"
  },
  "bf16": false,
  "emb_dropout_prob": 0.0,
  "fp16": false,
  "fp32": false,
  "hidden_size": 2048,
  "initializer_range": 0.02,
  "intermediate_size": 11008,
  "kv_channels": 128,
  "layer_norm_epsilon": 1e-06,
  "max_position_embeddings": 8192,
  "model_type": "qwen",
  "no_bias": true,
  "num_attention_heads": 16,
  "num_hidden_layers": 24,
  "onnx_safe": null,
  "rotary_emb_base": 10000,
  "rotary_pct": 1.0,
  "scale_attn_weights": true,
  "seq_length": 8192,
  "softmax_in_fp32": false,
  "tie_word_embeddings": false,
  "tokenizer_class": "QWenTokenizer",
  "transformers_version": "4.44.2",
  "use_cache": true,
  "use_cache_kernel": false,
  "use_cache_quantization": false,
  "use_dynamic_ntk": true,
  "use_flash_attn": "auto",
  "use_logn_attn": true,
  "vocab_size": 151936
}


[INFO|tokenization_utils_base.py:2684] 2024-09-02 16:34:58,235 >> tokenizer config file saved in saves\Qwen-1.8B\lora\train_2024-09-02-15-46-54\tokenizer_config.json

[INFO|tokenization_utils_base.py:2693] 2024-09-02 16:34:58,238 >> Special tokens file saved in saves\Qwen-1.8B\lora\train_2024-09-02-15-46-54\special_tokens_map.json

[WARNING|ploting.py:89] 2024-09-02 16:34:59,062 >> No metric eval_accuracy to plot.

[INFO|trainer.py:3819] 2024-09-02 16:34:59,082 >> 
***** Running Evaluation *****

[INFO|trainer.py:3821] 2024-09-02 16:34:59,084 >>   Num examples = 100

[INFO|trainer.py:3824] 2024-09-02 16:34:59,086 >>   Batch size = 2

[INFO|modelcard.py:449] 2024-09-02 16:35:12,284 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}