Using RTX 3090 or 4000 series which doesn't support faster communication speedups. Ensuring P2P and IB communications are disabled. 01/04/2024 09:53:50 - WARNING - llmtuner.model.parser - We recommend enable `upcast_layernorm` in quantized training. 01/04/2024 09:53:50 - WARNING - llmtuner.model.parser - We recommend enable mixed precision training. 01/04/2024 09:53:50 - WARNING - llmtuner.model.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training. [INFO|training_args.py:1838] 2024-01-04 09:53:50,866 >> PyTorch: setting up devices /home/hangyu5/anaconda3/envs/llama_factory/lib/python3.11/site-packages/transformers/training_args.py:1751: FutureWarning: `--push_to_hub_token` is deprecated and will be removed in version 5 of 🤗 Transformers. Use `--hub_token` instead. warnings.warn( 01/04/2024 09:53:50 - INFO - llmtuner.model.parser - Process rank: 0, device: cuda:0, n_gpu: 1 distributed training: True, compute dtype: None 01/04/2024 09:53:50 - INFO - llmtuner.model.parser - Training/evaluation parameters Seq2SeqTrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_persistent_workers=False, dataloader_pin_memory=True, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=False, ddp_timeout=1800, debug=[], deepspeed=None, disable_tqdm=False, dispatch_batches=None, do_eval=True, do_predict=False, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=None, evaluation_strategy=IntervalStrategy.EPOCH, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, generation_config=None, generation_max_length=None, generation_num_beams=None, gradient_accumulation_steps=4, gradient_checkpointing=False, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=None, hub_private_repo=False, hub_strategy=HubStrategy.EVERY_SAVE, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=5e-05, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=./models/sft/dolphin-2_6-phi-2-sft-glaive-function-calling-v2-ep1-lora/runs/Jan04_09-53-50_yhyu13fuwuqi, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=10, logging_strategy=IntervalStrategy.STEPS, lr_scheduler_kwargs={}, lr_scheduler_type=SchedulerType.COSINE, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mp_parameters=, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=1.0, optim=OptimizerNames.ADAMW_TORCH, optim_args=None, output_dir=./models/sft/dolphin-2_6-phi-2-sft-glaive-function-calling-v2-ep1-lora, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=1, per_device_train_batch_size=1, predict_with_generate=False, prediction_loss_only=True, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=['tensorboard'], resume_from_checkpoint=None, run_name=./models/sft/dolphin-2_6-phi-2-sft-glaive-function-calling-v2-ep1-lora, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=IntervalStrategy.STEPS, save_total_limit=None, seed=42, skip_memory_metrics=True, sortish_sampler=False, split_batches=False, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, ) 01/04/2024 09:53:50 - INFO - llmtuner.data.loader - Loading dataset ./glaive-function-calling-v2/simple-function-calling-v2_converted.json... 01/04/2024 09:53:50 - WARNING - llmtuner.data.utils - Checksum failed: missing SHA-1 hash value in dataset_info.json. 01/04/2024 09:53:50 - WARNING - llmtuner.model.parser - We recommend enable `upcast_layernorm` in quantized training. 01/04/2024 09:53:50 - WARNING - llmtuner.model.parser - We recommend enable mixed precision training. 01/04/2024 09:53:50 - WARNING - llmtuner.model.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training. /home/hangyu5/anaconda3/envs/llama_factory/lib/python3.11/site-packages/transformers/training_args.py:1751: FutureWarning: `--push_to_hub_token` is deprecated and will be removed in version 5 of 🤗 Transformers. Use `--hub_token` instead. warnings.warn( 01/04/2024 09:53:50 - INFO - llmtuner.model.parser - Process rank: 1, device: cuda:1, n_gpu: 1 distributed training: True, compute dtype: None 01/04/2024 09:53:50 - INFO - llmtuner.model.parser - Training/evaluation parameters Seq2SeqTrainingArguments( _n_gpu=1, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, bf16=False, bf16_full_eval=False, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=0, dataloader_persistent_workers=False, dataloader_pin_memory=True, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=False, ddp_timeout=1800, debug=[], deepspeed=None, disable_tqdm=False, dispatch_batches=None, do_eval=True, do_predict=False, do_train=True, eval_accumulation_steps=None, eval_delay=0, eval_steps=None, evaluation_strategy=IntervalStrategy.EPOCH, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, generation_config=None, generation_max_length=None, generation_num_beams=None, gradient_accumulation_steps=4, gradient_checkpointing=False, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=None, hub_private_repo=False, hub_strategy=HubStrategy.EVERY_SAVE, hub_token=, ignore_data_skip=False, include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=5e-05, length_column_name=length, load_best_model_at_end=False, local_rank=1, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=./models/sft/dolphin-2_6-phi-2-sft-glaive-function-calling-v2-ep1-lora/runs/Jan04_09-53-50_yhyu13fuwuqi, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=10, logging_strategy=IntervalStrategy.STEPS, lr_scheduler_kwargs={}, lr_scheduler_type=SchedulerType.COSINE, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mp_parameters=, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=1.0, optim=OptimizerNames.ADAMW_TORCH, optim_args=None, output_dir=./models/sft/dolphin-2_6-phi-2-sft-glaive-function-calling-v2-ep1-lora, overwrite_output_dir=True, past_index=-1, per_device_eval_batch_size=1, per_device_train_batch_size=1, predict_with_generate=False, prediction_loss_only=True, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ray_scope=last, remove_unused_columns=True, report_to=['tensorboard'], resume_from_checkpoint=None, run_name=./models/sft/dolphin-2_6-phi-2-sft-glaive-function-calling-v2-ep1-lora, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=IntervalStrategy.STEPS, save_total_limit=None, seed=42, skip_memory_metrics=True, sortish_sampler=False, split_batches=False, tf32=None, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_mps_device=False, warmup_ratio=0.0, warmup_steps=0, weight_decay=0.0, ) 01/04/2024 09:53:50 - INFO - llmtuner.data.loader - Loading dataset ./glaive-function-calling-v2/simple-function-calling-v2_converted.json... 01/04/2024 09:53:50 - WARNING - llmtuner.data.utils - Checksum failed: missing SHA-1 hash value in dataset_info.json. Using custom data configuration default-b024aadef2a1493c Loading Dataset Infos from /home/hangyu5/anaconda3/envs/llama_factory/lib/python3.11/site-packages/datasets/packaged_modules/json Overwrite dataset info from restored data version if exists. Loading Dataset info from /home/hangyu5/.cache/huggingface/datasets/json/default-b024aadef2a1493c/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96 Found cached dataset json (/home/hangyu5/.cache/huggingface/datasets/json/default-b024aadef2a1493c/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96) Loading Dataset info from /home/hangyu5/.cache/huggingface/datasets/json/default-b024aadef2a1493c/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96 [INFO|tokenization_utils_base.py:2024] 2024-01-04 09:53:51,685 >> loading file vocab.json [INFO|tokenization_utils_base.py:2024] 2024-01-04 09:53:51,685 >> loading file merges.txt [INFO|tokenization_utils_base.py:2024] 2024-01-04 09:53:51,685 >> loading file added_tokens.json [INFO|tokenization_utils_base.py:2024] 2024-01-04 09:53:51,685 >> loading file special_tokens_map.json [INFO|tokenization_utils_base.py:2024] 2024-01-04 09:53:51,685 >> loading file tokenizer_config.json [INFO|tokenization_utils_base.py:2024] 2024-01-04 09:53:51,685 >> loading file tokenizer.json [WARNING|logging.py:314] 2024-01-04 09:53:51,743 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [INFO|configuration_utils.py:737] 2024-01-04 09:53:51,744 >> loading configuration file cognitivecomputations/dolphin-2_6-phi-2/config.json [INFO|configuration_utils.py:737] 2024-01-04 09:53:51,749 >> loading configuration file cognitivecomputations/dolphin-2_6-phi-2/config.json [INFO|configuration_utils.py:802] 2024-01-04 09:53:51,750 >> Model config PhiConfig { "_name_or_path": "cognitivecomputations/dolphin-2_6-phi-2", "activation_function": "gelu_new", "architectures": [ "PhiForCausalLM" ], "attn_pdrop": 0.0, "auto_map": { "AutoConfig": "configuration_phi.PhiConfig", "AutoModelForCausalLM": "modeling_phi.PhiForCausalLM" }, "embd_pdrop": 0.0, "flash_attn": false, "flash_rotary": false, "fused_dense": false, "img_processor": null, "initializer_range": 0.02, "layer_norm_epsilon": 1e-05, "model_type": "phi-msft", "n_embd": 2560, "n_head": 32, "n_head_kv": null, "n_inner": null, "n_layer": 32, "n_positions": 2048, "resid_pdrop": 0.1, "rotary_dim": 32, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.36.2", "use_cache": false, "vocab_size": 51200 } 01/04/2024 09:53:51 - INFO - llmtuner.model.patcher - Quantizing model to 4 bit. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. 01/04/2024 09:53:51 - INFO - llmtuner.model.patcher - Quantizing model to 4 bit. [INFO|modeling_utils.py:2907] 2024-01-04 09:53:51,820 >> Overriding torch_dtype=None with `torch_dtype=torch.float16` due to requirements of `bitsandbytes` to enable model loading in 8-bit or 4-bit. Pass your own torch_dtype to specify the dtype of the remaining non-linear layers or pass torch_dtype=torch.float16 to remove this warning. [INFO|modeling_utils.py:3341] 2024-01-04 09:53:51,820 >> loading weights file cognitivecomputations/dolphin-2_6-phi-2/model.safetensors.index.json [INFO|modeling_utils.py:1341] 2024-01-04 09:53:51,821 >> Instantiating PhiForCausalLM model under default dtype torch.float16. [INFO|configuration_utils.py:826] 2024-01-04 09:53:51,821 >> Generate config GenerationConfig { "use_cache": false } [INFO|configuration_utils.py:826] 2024-01-04 09:53:51,822 >> Generate config GenerationConfig { "use_cache": false } [INFO|modeling_utils.py:3483] 2024-01-04 09:53:51,875 >> Detected 4-bit loading: activating 4-bit loading for this model Loading checkpoint shards: 0%| | 0/2 [00:00> All the weights of PhiForCausalcognitivecomputations/dolphin-2_6-phi-2he model checkpoint at ./models/dolphin-2_6-phi-2. If your task is similar to the task the model of the checkpoint was trained on, you can already use PhiForCausalLM for predictions without further training. Loading checkpoint shards: 100%|██████████| 2/2 [00:01<00:00, 1.47it/s] Loading checkpoint shards: 100%|██████████| 2/2 [00:01<00:00, 1.30it/s] Some weights of the model checkpoint at ./models/dolphin-2_6-phi-2 were not used when initializing PhiForCausalLM: ['lm_head.linear.lora_B.default.weight', 'lm_head.linear.lora_A.default.weight'] - This IS expected if you are initializing PhiForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing PhiForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). [INFO|configuration_utils.py:779] 2024-01-04 09:53:53,733 >> loading configuration file ./models/dolphin-2_6-phi-2/generation_config.json [INFO|configuration_utils.py:826] 2024-01-04 09:53:53,733 >> Generate config GenerationConfig {} [WARNING|modeling_utils.py:2045] 2024-01-04 09:53:53,816 >> You are using an old version of the checkpointing format that is deprecated (We will also silently ignore `gradient_checkpointing_kwargs` in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method `_set_gradient_checkpointing` in your model. 01/04/2024 09:53:53 - INFO - llmtuner.model.patcher - Gradient checkpointing enabled. 01/04/2024 09:53:53 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA You are using an old version of the checkpointing format that is deprecated (We will also silently ignore `gradient_checkpointing_kwargs` in case you passed it).Please update to the new format on your modeling file. To use the new format, you need to completely remove the definition of the method `_set_gradient_checkpointing` in your model. 01/04/2024 09:53:53 - INFO - llmtuner.model.patcher - Gradient checkpointing enabled. 01/04/2024 09:53:53 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA 01/04/2024 09:53:53 - INFO - llmtuner.model.loader - trainable params: 2621440 || all params: 2782305280 || trainable%: 0.0942 01/04/2024 09:53:53 - INFO - llmtuner.model.loader - trainable params: 2621440 || all params: 2782305280 || trainable%: 0.0942 Running tokenizer on dataset: 0%| | 0/3347 [00:00> Token indices sequence length is longer than the specified maximum sequence length for this model (2217 > 2048). Running this sequence through the model will result in indexing errors Caching processed dataset at /home/hangyu5/.cache/huggingface/datasets/json/default-b024aadef2a1493c/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96/cache-c64b6c6785bc1929.arrow Running tokenizer on dataset: 30%|██▉ | 1000/3347 [00:02<00:06, 372.68 examples/s] Running tokenizer on dataset: 60%|█████▉ | 2000/3347 [00:05<00:03, 387.09 examples/s] Running tokenizer on dataset: 90%|████████▉ | 3000/3347 [00:07<00:00, 395.52 examples/s] Running tokenizer on dataset: 100%|██████████| 3347/3347 [00:08<00:00, 396.84 examples/s] Running tokenizer on dataset: 100%|██████████| 3347/3347 [00:08<00:00, 392.48 examples/s] input_ids: [32, 8537, 1022, 257, 11040, 2836, 290, 281, 11666, 4430, 8796, 13, 383, 8796, 3607, 7613, 11, 6496, 11, 290, 23507, 7429, 284, 262, 2836, 338, 2683, 13, 198, 20490, 25, 36230, 25, 921, 389, 257, 7613, 8796, 351, 1895, 284, 262, 1708, 5499, 13, 5765, 606, 611, 2672, 532, 198, 90, 198, 50284, 1, 3672, 1298, 366, 1136, 62, 1069, 3803, 62, 4873, 1600, 198, 50284, 1, 11213, 1298, 366, 3855, 262, 5163, 2494, 1022, 734, 19247, 1600, 198, 50284, 1, 17143, 7307, 1298, 1391, 198, 50280, 1, 4906, 1298, 366, 15252, 1600, 198, 50280, 1, 48310, 1298, 1391, 198, 50276, 1, 8692, 62, 34415, 1298, 1391, 198, 50272, 1, 4906, 1298, 366, 8841, 1600, 198, 50272, 1, 11213, 1298, 366, 464, 7395, 284, 10385, 422, 1, 198, 50276, 5512, 198, 50276, 1, 16793, 62, 34415, 1298, 1391, 198, 50272, 1, 4906, 1298, 366, 8841, 1600, 198, 50272, 1, 11213, 1298, 366, 464, 7395, 284, 10385, 284, 1, 198, 50276, 92, 198, 50280, 5512, 198, 50280, 1, 35827, 1298, 685, 198, 50276, 1, 8692, 62, 34415, 1600, 198, 50276, 1, 16793, 62, 34415, 1, 198, 50280, 60, 198, 50284, 92, 198, 92, 198, 198, 6090, 345, 1492, 257, 5474, 329, 502, 422, 968, 1971, 284, 3576, 30, 198, 48902, 25, 40, 1101, 7926, 11, 475, 314, 836, 470, 423, 262, 12971, 284, 1492, 13956, 13, 2011, 1459, 2163, 3578, 502, 284, 651, 262, 5163, 2494, 1022, 734, 19247, 13, 1002, 345, 761, 1037, 351, 326, 11, 1254, 1479, 284, 1265, 0, 50295] inputs: A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. Human: SYSTEM: You are a helpful assistant with access to the following functions. Use them if required - { "name": "get_exchange_rate", "description": "Get the exchange rate between two currencies", "parameters": { "type": "object", "properties": { "base_currency": { "type": "string", "description": "The currency to convert from" }, "target_currency": { "type": "string", "description": "The currency to convert to" } }, "required": [ "base_currency", "target_currency" ] } } Can you book a flight for me from New York to London? Assistant:I'm sorry, but I don't have the capability to book flights. My current function allows me to get the exchange rate between two currencies. If you need help with that, feel free to ask!<|im_end|> label_ids: [-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 40, 1101, 7926, 11, 475, 314, 836, 470, 423, 262, 12971, 284, 1492, 13956, 13, 2011, 1459, 2163, 3578, 502, 284, 651, 262, 5163, 2494, 1022, 734, 19247, 13, 1002, 345, 761, 1037, 351, 326, 11, 1254, 1479, 284, 1265, 0, 50295] labels: I'm sorry, but I don't have the capability to book flights. My current function allows me to get the exchange rate between two currencies. If you need help with that, feel free to ask!<|im_end|> [INFO|training_args.py:1838] 2024-01-04 09:54:03,936 >> PyTorch: setting up devices Caching indices mapping at /home/hangyu5/.cache/huggingface/datasets/json/default-b024aadef2a1493c/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96/cache-2d738e000d25696c.arrow Caching indices mapping at /home/hangyu5/.cache/huggingface/datasets/json/default-b024aadef2a1493c/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96/cache-fe95a5c264c6067e.arrow Running tokenizer on dataset: 0%| | 0/3347 [00:00 2048). Running this sequence through the model will result in indexing errors Running tokenizer on dataset: 30%|██▉ | 1000/3347 [00:02<00:06, 375.58 examples/s] Running tokenizer on dataset: 60%|█████▉ | 2000/3347 [00:05<00:03, 389.75 examples/s] Running tokenizer on dataset: 90%|████████▉ | 3000/3347 [00:07<00:00, 396.16 examples/s] Running tokenizer on dataset: 100%|██████████| 3347/3347 [00:08<00:00, 395.57 examples/s] Running tokenizer on dataset: 100%|██████████| 3347/3347 [00:08<00:00, 392.61 examples/s] [INFO|trainer.py:1706] 2024-01-04 09:54:13,452 >> ***** Running training ***** [INFO|trainer.py:1707] 2024-01-04 09:54:13,452 >> Num examples = 3,011 [INFO|trainer.py:1708] 2024-01-04 09:54:13,452 >> Num Epochs = 1 [INFO|trainer.py:1709] 2024-01-04 09:54:13,452 >> Instantaneous batch size per device = 1 [INFO|trainer.py:1712] 2024-01-04 09:54:13,452 >> Total train batch size (w. parallel, distributed & accumulation) = 8 [INFO|trainer.py:1713] 2024-01-04 09:54:13,452 >> Gradient Accumulation steps = 4 [INFO|trainer.py:1714] 2024-01-04 09:54:13,452 >> Total optimization steps = 376 [INFO|trainer.py:1715] 2024-01-04 09:54:13,454 >> Number of trainable parameters = 2,621,440 0%| | 0/376 [00:00> ***** Running Evaluation ***** [INFO|trainer.py:3168] 2024-01-04 10:02:58,683 >> Num examples = 335 [INFO|trainer.py:3171] 2024-01-04 10:02:58,683 >> Batch size = 1 0%| | 0/168 [00:00> Training completed. Do not forget to share your model on huggingface.co/models =) {'train_runtime': 553.4721, 'train_samples_per_second': 5.44, 'train_steps_per_second': 0.679, 'train_loss': 0.4441075046011742, 'epoch': 1.0} 100%|██████████| 376/376 [09:13<00:00, 1.47s/it] 100%|██████████| 376/376 [09:13<00:00, 1.47s/it] [INFO|trainer.py:2889] 2024-01-04 10:03:26,930 >> Saving model checkpoint to ./models/sft/dolphin-2_6-phi-2-sft-glaive-function-calling-v2-ep1-lora [INFO|tokenization_utils_base.py:2432] 2024-01-04 10:03:26,973 >> tokenizer config file saved in ./models/sft/dolphin-2_6-phi-2-sft-glaive-function-calling-v2-ep1-lora/tokenizer_config.json [INFO|tokenization_utils_base.py:2441] 2024-01-04 10:03:26,974 >> Special tokens file saved in ./models/sft/dolphin-2_6-phi-2-sft-glaive-function-calling-v2-ep1-lora/special_tokens_map.json [INFO|tokenization_utils_base.py:2492] 2024-01-04 10:03:26,974 >> added tokens file saved in ./models/sft/dolphin-2_6-phi-2-sft-glaive-function-calling-v2-ep1-lora/added_tokens.json ***** train metrics ***** epoch = 1.0 train_loss = 0.4441 train_runtime = 0:09:13.47 train_samples_per_second = 5.44 train_steps_per_second = 0.679 Figure saved: ./models/sft/dolphin-2_6-phi-2-sft-glaive-function-calling-v2-ep1-lora/training_loss.png Figure saved: ./models/sft/dolphin-2_6-phi-2-sft-glaive-function-calling-v2-ep1-lora/training_eval_loss.png [INFO|trainer.py:3166] 2024-01-04 10:03:27,895 >> ***** Running Evaluation ***** [INFO|trainer.py:3168] 2024-01-04 10:03:27,895 >> Num examples = 335 [INFO|trainer.py:3171] 2024-01-04 10:03:27,895 >> Batch size = 1 0%| | 0/168 [00:00> Dropping the following result as it does not have all the necessary fields: {'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}