05/21/2024 22:33:38 - INFO - transformers.tokenization_utils_base - loading file vocab.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen1.5-0.5B-Chat/snapshots/4d14e384a4b037942bb3f3016665157c8bcb70ea/vocab.json 05/21/2024 22:33:38 - INFO - transformers.tokenization_utils_base - loading file merges.txt from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen1.5-0.5B-Chat/snapshots/4d14e384a4b037942bb3f3016665157c8bcb70ea/merges.txt 05/21/2024 22:33:38 - INFO - transformers.tokenization_utils_base - loading file tokenizer.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen1.5-0.5B-Chat/snapshots/4d14e384a4b037942bb3f3016665157c8bcb70ea/tokenizer.json 05/21/2024 22:33:38 - INFO - transformers.tokenization_utils_base - loading file added_tokens.json from cache at None 05/21/2024 22:33:38 - INFO - transformers.tokenization_utils_base - loading file special_tokens_map.json from cache at None 05/21/2024 22:33:38 - INFO - transformers.tokenization_utils_base - loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen1.5-0.5B-Chat/snapshots/4d14e384a4b037942bb3f3016665157c8bcb70ea/tokenizer_config.json 05/21/2024 22:33:38 - WARNING - transformers.tokenization_utils_base - Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 05/21/2024 22:33:38 - INFO - llamafactory.data.template - Replace eos token: <|im_end|> 05/21/2024 22:33:38 - INFO - llamafactory.data.loader - Loading dataset identity.json... 05/21/2024 22:33:52 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen1.5-0.5B-Chat/snapshots/4d14e384a4b037942bb3f3016665157c8bcb70ea/config.json 05/21/2024 22:33:52 - INFO - transformers.configuration_utils - Model config Qwen2Config { "_name_or_path": "Qwen/Qwen1.5-0.5B-Chat", "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 1024, "initializer_range": 0.02, "intermediate_size": 2816, "max_position_embeddings": 32768, "max_window_layers": 21, "model_type": "qwen2", "num_attention_heads": 16, "num_hidden_layers": 24, "num_key_value_heads": 16, "rms_norm_eps": 1e-06, "rope_theta": 1000000.0, "sliding_window": 32768, "tie_word_embeddings": true, "torch_dtype": "bfloat16", "transformers_version": "4.41.0", "use_cache": true, "use_sliding_window": false, "vocab_size": 151936 } 05/21/2024 22:34:00 - INFO - transformers.modeling_utils - loading weights file model.safetensors from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen1.5-0.5B-Chat/snapshots/4d14e384a4b037942bb3f3016665157c8bcb70ea/model.safetensors 05/21/2024 22:34:00 - INFO - transformers.modeling_utils - Instantiating Qwen2ForCausalLM model under default dtype torch.float16. 05/21/2024 22:34:00 - INFO - transformers.generation.configuration_utils - Generate config GenerationConfig { "bos_token_id": 151643, "eos_token_id": 151645 } 05/21/2024 22:34:03 - INFO - transformers.modeling_utils - All model checkpoint weights were used when initializing Qwen2ForCausalLM. 05/21/2024 22:34:03 - INFO - transformers.modeling_utils - All the weights of Qwen2ForCausalLM were initialized from the model checkpoint at Qwen/Qwen1.5-0.5B-Chat. If your task is similar to the task the model of the checkpoint was trained on, you can already use Qwen2ForCausalLM for predictions without further training. 05/21/2024 22:34:04 - INFO - transformers.generation.configuration_utils - loading configuration file generation_config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen1.5-0.5B-Chat/snapshots/4d14e384a4b037942bb3f3016665157c8bcb70ea/generation_config.json 05/21/2024 22:34:04 - INFO - transformers.generation.configuration_utils - Generate config GenerationConfig { "bos_token_id": 151643, "do_sample": true, "eos_token_id": [ 151645, 151643 ], "pad_token_id": 151643, "repetition_penalty": 1.1, "top_p": 0.8 } 05/21/2024 22:34:04 - INFO - llamafactory.model.utils.checkpointing - Gradient checkpointing enabled. 05/21/2024 22:34:04 - INFO - llamafactory.model.utils.attention - Using torch SDPA for faster training and inference. 05/21/2024 22:34:04 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32. 05/21/2024 22:34:04 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA 05/21/2024 22:34:05 - INFO - llamafactory.model.loader - trainable params: 786432 || all params: 464774144 || trainable%: 0.1692 05/21/2024 22:34:05 - INFO - transformers.trainer - Using auto half precision backend 05/21/2024 22:34:05 - INFO - transformers.trainer - ***** Running training ***** 05/21/2024 22:34:05 - INFO - transformers.trainer - Num examples = 91 05/21/2024 22:34:05 - INFO - transformers.trainer - Num Epochs = 3 05/21/2024 22:34:05 - INFO - transformers.trainer - Instantaneous batch size per device = 2 05/21/2024 22:34:05 - INFO - transformers.trainer - Total train batch size (w. parallel, distributed & accumulation) = 16 05/21/2024 22:34:05 - INFO - transformers.trainer - Gradient Accumulation steps = 8 05/21/2024 22:34:05 - INFO - transformers.trainer - Total optimization steps = 15 05/21/2024 22:34:05 - INFO - transformers.trainer - Number of trainable parameters = 786,432 05/21/2024 22:34:13 - INFO - llamafactory.extras.callbacks - {'loss': 3.4077, 'learning_rate': 3.7500e-05, 'epoch': 0.87} 05/21/2024 22:34:22 - INFO - llamafactory.extras.callbacks - {'loss': 3.3417, 'learning_rate': 1.2500e-05, 'epoch': 1.74} 05/21/2024 22:34:29 - INFO - llamafactory.extras.callbacks - {'loss': 3.2838, 'learning_rate': 0.0000e+00, 'epoch': 2.61} 05/21/2024 22:34:29 - INFO - transformers.trainer - Training completed. Do not forget to share your model on huggingface.co/models =) 05/21/2024 22:34:29 - INFO - transformers.trainer - Saving model checkpoint to saves/Qwen1.5-0.5B-Chat/lora/QwenTT2 05/21/2024 22:34:29 - INFO - transformers.configuration_utils - loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--Qwen--Qwen1.5-0.5B-Chat/snapshots/4d14e384a4b037942bb3f3016665157c8bcb70ea/config.json 05/21/2024 22:34:29 - INFO - transformers.configuration_utils - Model config Qwen2Config { "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151645, "hidden_act": "silu", "hidden_size": 1024, "initializer_range": 0.02, "intermediate_size": 2816, "max_position_embeddings": 32768, "max_window_layers": 21, "model_type": "qwen2", "num_attention_heads": 16, "num_hidden_layers": 24, "num_key_value_heads": 16, "rms_norm_eps": 1e-06, "rope_theta": 1000000.0, "sliding_window": 32768, "tie_word_embeddings": true, "torch_dtype": "bfloat16", "transformers_version": "4.41.0", "use_cache": true, "use_sliding_window": false, "vocab_size": 151936 } 05/21/2024 22:34:29 - INFO - transformers.tokenization_utils_base - tokenizer config file saved in saves/Qwen1.5-0.5B-Chat/lora/QwenTT2/tokenizer_config.json 05/21/2024 22:34:29 - INFO - transformers.tokenization_utils_base - Special tokens file saved in saves/Qwen1.5-0.5B-Chat/lora/QwenTT2/special_tokens_map.json 05/21/2024 22:34:30 - WARNING - llamafactory.extras.ploting - No metric eval_loss to plot. 05/21/2024 22:34:30 - INFO - transformers.modelcard - Dropping the following result as it does not have all the necessary fields: {'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}