[WARNING|parser.py:272] 2024-07-24 15:04:58,287 >> We recommend enable `upcast_layernorm` in quantized training.

[WARNING|parser.py:292] 2024-07-24 15:04:58,287 >> `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.

[INFO|parser.py:344] 2024-07-24 15:04:58,288 >> Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16

07/24/2024 15:04:58 - WARNING - llamafactory.hparams.parser - We recommend enable `upcast_layernorm` in quantized training.

07/24/2024 15:04:58 - WARNING - llamafactory.hparams.parser - `ddp_find_unused_parameters` needs to be set as False for LoRA in DDP training.

07/24/2024 15:04:58 - INFO - llamafactory.hparams.parser - Process rank: 1, device: cuda:1, n_gpu: 1, distributed training: True, compute dtype: torch.bfloat16

[INFO|tokenization_utils_base.py:2289] 2024-07-24 15:05:00,642 >> loading file tokenizer.model from cache at /workspace/data/huggingface-cache/hub/models--microsoft--Phi-3-medium-128k-instruct/snapshots/cae1d42b5577398fd1be9f0746052562ae552886/tokenizer.model

[INFO|tokenization_utils_base.py:2289] 2024-07-24 15:05:00,643 >> loading file tokenizer.json from cache at /workspace/data/huggingface-cache/hub/models--microsoft--Phi-3-medium-128k-instruct/snapshots/cae1d42b5577398fd1be9f0746052562ae552886/tokenizer.json

[INFO|tokenization_utils_base.py:2289] 2024-07-24 15:05:00,643 >> loading file added_tokens.json from cache at /workspace/data/huggingface-cache/hub/models--microsoft--Phi-3-medium-128k-instruct/snapshots/cae1d42b5577398fd1be9f0746052562ae552886/added_tokens.json

[INFO|tokenization_utils_base.py:2289] 2024-07-24 15:05:00,643 >> loading file special_tokens_map.json from cache at /workspace/data/huggingface-cache/hub/models--microsoft--Phi-3-medium-128k-instruct/snapshots/cae1d42b5577398fd1be9f0746052562ae552886/special_tokens_map.json

[INFO|tokenization_utils_base.py:2289] 2024-07-24 15:05:00,643 >> loading file tokenizer_config.json from cache at /workspace/data/huggingface-cache/hub/models--microsoft--Phi-3-medium-128k-instruct/snapshots/cae1d42b5577398fd1be9f0746052562ae552886/tokenizer_config.json

[INFO|tokenization_utils_base.py:2533] 2024-07-24 15:05:00,693 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

[INFO|loader.py:52] 2024-07-24 15:05:00,694 >> Loading dataset dataset_alpaca_IT_train_and_eval_25K.json...

07/24/2024 15:05:08 - INFO - llamafactory.data.loader - Loading dataset dataset_alpaca_IT_train_and_eval_25K.json...

[INFO|configuration_utils.py:733] 2024-07-24 15:05:11,124 >> loading configuration file config.json from cache at /workspace/data/huggingface-cache/hub/models--microsoft--Phi-3-medium-128k-instruct/snapshots/cae1d42b5577398fd1be9f0746052562ae552886/config.json

[INFO|configuration_utils.py:733] 2024-07-24 15:05:11,485 >> loading configuration file config.json from cache at /workspace/data/huggingface-cache/hub/models--microsoft--Phi-3-medium-128k-instruct/snapshots/cae1d42b5577398fd1be9f0746052562ae552886/config.json

[INFO|configuration_utils.py:800] 2024-07-24 15:05:11,488 >> Model config Phi3Config {
  "_name_or_path": "microsoft/Phi-3-medium-128k-instruct",
  "architectures": [
    "Phi3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "microsoft/Phi-3-medium-128k-instruct--configuration_phi3.Phi3Config",
    "AutoModelForCausalLM": "microsoft/Phi-3-medium-128k-instruct--modeling_phi3.Phi3ForCausalLM"
  },
  "bos_token_id": 1,
  "embd_pdrop": 0.0,
  "eos_token_id": 32000,
  "hidden_act": "silu",
  "hidden_size": 5120,
  "initializer_range": 0.02,
  "intermediate_size": 17920,
  "max_position_embeddings": 131072,
  "model_type": "phi3",
  "num_attention_heads": 40,
  "num_hidden_layers": 40,
  "num_key_value_heads": 10,
  "original_max_position_embeddings": 4096,
  "pad_token_id": null,
  "resid_pdrop": 0.0,
  "rms_norm_eps": 1e-05,
  "rope_scaling": {
    "long_factor": [
      1.0,
      1.0,
      1.0,
      1.0,
      1.0,
      1.0,
      1.0,
      1.0,
      1.0,
      1.0,
      1.0,
      1.0,
      1.0,
      1.25,
      1.25,
      1.5,
      2.0,
      2.75,
      5.75,
      5.75,
      6.5,
      9.25,
      11.0,
      13.25,
      19.25,
      19.75,
      19.75,
      21.25,
      21.5,
      26.5,
      30.0,
      33.75,
      35.25,
      38.5,
      42.0,
      42.25,
      46.0,
      47.0,
      50.0,
      50.5,
      51.0,
      52.0,
      52.75,
      53.75,
      54.75,
      57.0,
      57.25,
      58.5,
      59.25,
      59.5,
      62.0,
      62.5,
      62.75,
      63.25,
      63.25,
      63.25,
      63.75,
      64.0,
      64.0,
      64.25,
      64.5,
      64.5,
      65.0,
      65.0
    ],
    "short_factor": [
      1.0,
      1.0,
      1.0,
      1.0,
      1.0,
      1.0,
      1.01,
      1.02,
      1.02,
      1.04,
      1.04,
      1.07,
      1.07,
      1.1,
      1.3000000000000003,
      1.3000000000000003,
      1.5000000000000004,
      1.5700000000000005,
      1.9000000000000008,
      2.3100000000000014,
      2.759999999999992,
      3.3899999999999784,
      3.9399999999999666,
      4.009999999999965,
      4.289999999999959,
      4.349999999999958,
      5.349999999999937,
      6.659999999999909,
      7.029999999999901,
      7.51999999999989,
      8.00999999999988,
      8.249999999999876,
      8.279999999999875,
      9.629999999999846,
      9.89999999999984,
      10.589999999999826,
      11.049999999999816,
      11.7899999999998,
      12.189999999999792,
      12.889999999999777,
      13.129999999999772,
      13.16999999999977,
      13.20999999999977,
      13.479999999999764,
      13.539999999999763,
      13.779999999999758,
      13.929999999999755,
      14.429999999999744,
      14.759999999999737,
      15.149999999999729,
      15.419999999999723,
      15.53999999999972,
      15.659999999999718,
      15.749999999999716,
      15.759999999999716,
      15.799999999999715,
      16.05999999999971,
      16.079999999999714,
      16.11999999999972,
      16.11999999999972,
      16.18999999999973,
      16.31999999999975,
      16.539999999999786,
      16.799999999999827
    ],
    "type": "su"
  },
  "rope_theta": 10000.0,
  "sliding_window": 131072,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.43.1",
  "use_cache": true,
  "vocab_size": 32064
}


[INFO|quantization.py:182] 2024-07-24 15:05:11,496 >> Quantizing model to 4 bit with bitsandbytes.

[INFO|modeling_utils.py:3621] 2024-07-24 15:05:12,104 >> loading weights file model.safetensors from cache at /workspace/data/huggingface-cache/hub/models--microsoft--Phi-3-medium-128k-instruct/snapshots/cae1d42b5577398fd1be9f0746052562ae552886/model.safetensors.index.json

07/24/2024 15:05:12 - INFO - llamafactory.model.model_utils.quantization - Quantizing model to 4 bit with bitsandbytes.

[INFO|modeling_utils.py:1569] 2024-07-24 15:10:52,981 >> Instantiating Phi3ForCausalLM model under default dtype torch.bfloat16.

[INFO|configuration_utils.py:1038] 2024-07-24 15:10:52,989 >> Generate config GenerationConfig {
  "bos_token_id": 1,
  "eos_token_id": 32000
}


[INFO|modeling_utils.py:4450] 2024-07-24 15:11:18,809 >> All model checkpoint weights were used when initializing Phi3ForCausalLM.


[INFO|modeling_utils.py:4458] 2024-07-24 15:11:18,810 >> All the weights of Phi3ForCausalLM were initialized from the model checkpoint at microsoft/Phi-3-medium-128k-instruct.
If your task is similar to the task the model of the checkpoint was trained on, you can already use Phi3ForCausalLM for predictions without further training.

[INFO|configuration_utils.py:993] 2024-07-24 15:11:18,895 >> loading configuration file generation_config.json from cache at /workspace/data/huggingface-cache/hub/models--microsoft--Phi-3-medium-128k-instruct/snapshots/cae1d42b5577398fd1be9f0746052562ae552886/generation_config.json

[INFO|configuration_utils.py:1038] 2024-07-24 15:11:18,896 >> Generate config GenerationConfig {
  "bos_token_id": 1,
  "eos_token_id": [
    32000,
    32001,
    32007
  ],
  "pad_token_id": 32000
}


07/24/2024 15:12:39 - INFO - llamafactory.model.model_utils.checkpointing - Gradient checkpointing enabled.

07/24/2024 15:12:39 - INFO - llamafactory.model.model_utils.attention - Using FlashAttention-2 for faster training and inference.

07/24/2024 15:12:39 - INFO - llamafactory.model.adapter - Upcasting trainable params to float32.

07/24/2024 15:12:39 - INFO - llamafactory.model.adapter - Fine-tuning method: LoRA

07/24/2024 15:12:39 - INFO - llamafactory.model.model_utils.misc - Found linear modules: o_proj,gate_up_proj,qkv_proj,down_proj

07/24/2024 15:12:39 - INFO - llamafactory.model.loader - trainable params: 27,852,800 || all params: 13,988,090,880 || trainable%: 0.1991

[INFO|checkpointing.py:103] 2024-07-24 15:12:42,061 >> Gradient checkpointing enabled.

[INFO|attention.py:82] 2024-07-24 15:12:42,061 >> Using FlashAttention-2 for faster training and inference.

[INFO|adapter.py:302] 2024-07-24 15:12:42,061 >> Upcasting trainable params to float32.

[INFO|adapter.py:158] 2024-07-24 15:12:42,061 >> Fine-tuning method: LoRA

[INFO|misc.py:51] 2024-07-24 15:12:42,062 >> Found linear modules: o_proj,qkv_proj,down_proj,gate_up_proj

[INFO|loader.py:196] 2024-07-24 15:12:42,467 >> trainable params: 27,852,800 || all params: 13,988,090,880 || trainable%: 0.1991

[INFO|trainer.py:648] 2024-07-24 15:12:42,473 >> Using auto half precision backend

[INFO|deepspeed.py:329] 2024-07-24 15:12:42,673 >> Detected ZeRO Offload and non-DeepSpeed optimizers: This combination should work as long as the custom optimizer has both CPU and GPU implementation (except LAMB)

[INFO|trainer.py:2134] 2024-07-24 15:13:06,954 >> ***** Running training *****

[INFO|trainer.py:2135] 2024-07-24 15:13:06,954 >>   Num examples = 4,944

[INFO|trainer.py:2136] 2024-07-24 15:13:06,954 >>   Num Epochs = 3

[INFO|trainer.py:2137] 2024-07-24 15:13:06,954 >>   Instantaneous batch size per device = 2

[INFO|trainer.py:2140] 2024-07-24 15:13:06,954 >>   Total train batch size (w. parallel, distributed & accumulation) = 32

[INFO|trainer.py:2141] 2024-07-24 15:13:06,954 >>   Gradient Accumulation steps = 8

[INFO|trainer.py:2142] 2024-07-24 15:13:06,954 >>   Total optimization steps = 462

[INFO|trainer.py:2143] 2024-07-24 15:13:06,958 >>   Number of trainable parameters = 27,852,800

[INFO|callbacks.py:310] 2024-07-24 15:21:31,280 >> {'loss': 0.5099, 'learning_rate': 1.0000e-05, 'epoch': 0.06, 'throughput': 2571.26}

[INFO|callbacks.py:310] 2024-07-24 15:30:13,855 >> {'loss': 0.5115, 'learning_rate': 2.0000e-05, 'epoch': 0.13, 'throughput': 2534.92}

[INFO|callbacks.py:310] 2024-07-24 15:38:39,051 >> {'loss': 0.4846, 'learning_rate': 3.0000e-05, 'epoch': 0.19, 'throughput': 2535.35}

[INFO|callbacks.py:310] 2024-07-24 15:47:32,541 >> {'loss': 0.4076, 'learning_rate': 4.0000e-05, 'epoch': 0.26, 'throughput': 2553.97}

[INFO|callbacks.py:310] 2024-07-24 15:56:37,450 >> {'loss': 0.3073, 'learning_rate': 5.0000e-05, 'epoch': 0.32, 'throughput': 2556.06}

[INFO|callbacks.py:310] 2024-07-24 16:04:50,564 >> {'loss': 0.2516, 'learning_rate': 4.9927e-05, 'epoch': 0.39, 'throughput': 2559.12}

[INFO|callbacks.py:310] 2024-07-24 16:13:54,185 >> {'loss': 0.2256, 'learning_rate': 4.9710e-05, 'epoch': 0.45, 'throughput': 2549.95}

[INFO|callbacks.py:310] 2024-07-24 16:22:56,812 >> {'loss': 0.2146, 'learning_rate': 4.9349e-05, 'epoch': 0.52, 'throughput': 2547.79}

[INFO|callbacks.py:310] 2024-07-24 16:31:48,796 >> {'loss': 0.2018, 'learning_rate': 4.8846e-05, 'epoch': 0.58, 'throughput': 2552.59}

[INFO|callbacks.py:310] 2024-07-24 16:40:22,764 >> {'loss': 0.1958, 'learning_rate': 4.8205e-05, 'epoch': 0.65, 'throughput': 2556.94}

[INFO|callbacks.py:310] 2024-07-24 16:48:46,503 >> {'loss': 0.1912, 'learning_rate': 4.7429e-05, 'epoch': 0.71, 'throughput': 2557.17}

[INFO|callbacks.py:310] 2024-07-24 16:57:16,752 >> {'loss': 0.1876, 'learning_rate': 4.6522e-05, 'epoch': 0.78, 'throughput': 2558.65}

[INFO|callbacks.py:310] 2024-07-24 17:05:34,669 >> {'loss': 0.1802, 'learning_rate': 4.5491e-05, 'epoch': 0.84, 'throughput': 2561.14}

[INFO|callbacks.py:310] 2024-07-24 17:14:13,797 >> {'loss': 0.1793, 'learning_rate': 4.4340e-05, 'epoch': 0.91, 'throughput': 2560.01}

[INFO|callbacks.py:310] 2024-07-24 17:22:00,855 >> {'loss': 0.1759, 'learning_rate': 4.3077e-05, 'epoch': 0.97, 'throughput': 2565.87}

[INFO|callbacks.py:310] 2024-07-24 17:30:17,877 >> {'loss': 0.1746, 'learning_rate': 4.1709e-05, 'epoch': 1.04, 'throughput': 2564.86}

[INFO|callbacks.py:310] 2024-07-24 17:38:59,853 >> {'loss': 0.1699, 'learning_rate': 4.0244e-05, 'epoch': 1.10, 'throughput': 2564.04}

[INFO|callbacks.py:310] 2024-07-24 17:48:09,599 >> {'loss': 0.1680, 'learning_rate': 3.8690e-05, 'epoch': 1.17, 'throughput': 2560.03}

[INFO|callbacks.py:310] 2024-07-24 17:56:04,957 >> {'loss': 0.1646, 'learning_rate': 3.7057e-05, 'epoch': 1.23, 'throughput': 2563.38}

[INFO|callbacks.py:310] 2024-07-24 18:04:29,792 >> {'loss': 0.1667, 'learning_rate': 3.5354e-05, 'epoch': 1.29, 'throughput': 2565.53}

[INFO|callbacks.py:310] 2024-07-24 18:13:16,644 >> {'loss': 0.1664, 'learning_rate': 3.3590e-05, 'epoch': 1.36, 'throughput': 2565.84}

[INFO|callbacks.py:310] 2024-07-24 18:21:27,030 >> {'loss': 0.1622, 'learning_rate': 3.1777e-05, 'epoch': 1.42, 'throughput': 2565.58}

[INFO|callbacks.py:310] 2024-07-24 18:30:03,234 >> {'loss': 0.1623, 'learning_rate': 2.9924e-05, 'epoch': 1.49, 'throughput': 2565.43}

[INFO|callbacks.py:310] 2024-07-24 18:38:50,374 >> {'loss': 0.1616, 'learning_rate': 2.8043e-05, 'epoch': 1.55, 'throughput': 2565.51}

[INFO|callbacks.py:310] 2024-07-24 18:47:03,606 >> {'loss': 0.1590, 'learning_rate': 2.6143e-05, 'epoch': 1.62, 'throughput': 2566.31}

[INFO|callbacks.py:310] 2024-07-24 18:55:49,268 >> {'loss': 0.1619, 'learning_rate': 2.4238e-05, 'epoch': 1.68, 'throughput': 2564.17}

[INFO|callbacks.py:310] 2024-07-24 19:04:24,775 >> {'loss': 0.1616, 'learning_rate': 2.2336e-05, 'epoch': 1.75, 'throughput': 2565.53}

[INFO|callbacks.py:310] 2024-07-24 19:12:40,325 >> {'loss': 0.1604, 'learning_rate': 2.0450e-05, 'epoch': 1.81, 'throughput': 2566.61}

[INFO|callbacks.py:310] 2024-07-24 19:21:50,945 >> {'loss': 0.1563, 'learning_rate': 1.8591e-05, 'epoch': 1.88, 'throughput': 2564.70}

[INFO|callbacks.py:310] 2024-07-24 19:30:42,384 >> {'loss': 0.1548, 'learning_rate': 1.6769e-05, 'epoch': 1.94, 'throughput': 2565.36}

[INFO|callbacks.py:310] 2024-07-24 19:39:26,495 >> {'loss': 0.1555, 'learning_rate': 1.4994e-05, 'epoch': 2.01, 'throughput': 2565.22}

[INFO|callbacks.py:310] 2024-07-24 19:48:16,049 >> {'loss': 0.1526, 'learning_rate': 1.3278e-05, 'epoch': 2.07, 'throughput': 2564.82}

[INFO|callbacks.py:310] 2024-07-24 19:56:55,676 >> {'loss': 0.1526, 'learning_rate': 1.1630e-05, 'epoch': 2.14, 'throughput': 2563.60}

[INFO|callbacks.py:310] 2024-07-24 20:05:48,055 >> {'loss': 0.1516, 'learning_rate': 1.0060e-05, 'epoch': 2.20, 'throughput': 2564.75}

[INFO|callbacks.py:310] 2024-07-24 20:14:05,975 >> {'loss': 0.1524, 'learning_rate': 8.5762e-06, 'epoch': 2.27, 'throughput': 2565.56}

[INFO|callbacks.py:310] 2024-07-24 20:23:04,597 >> {'loss': 0.1502, 'learning_rate': 7.1880e-06, 'epoch': 2.33, 'throughput': 2564.08}

[INFO|callbacks.py:310] 2024-07-24 20:31:47,755 >> {'loss': 0.1506, 'learning_rate': 5.9035e-06, 'epoch': 2.39, 'throughput': 2563.27}

[INFO|callbacks.py:310] 2024-07-24 20:40:43,735 >> {'loss': 0.1479, 'learning_rate': 4.7298e-06, 'epoch': 2.46, 'throughput': 2560.31}

[INFO|callbacks.py:310] 2024-07-24 20:49:04,924 >> {'loss': 0.1501, 'learning_rate': 3.6740e-06, 'epoch': 2.52, 'throughput': 2560.79}

[INFO|callbacks.py:310] 2024-07-24 20:57:37,960 >> {'loss': 0.1504, 'learning_rate': 2.7422e-06, 'epoch': 2.59, 'throughput': 2561.70}

[INFO|callbacks.py:310] 2024-07-24 21:05:54,158 >> {'loss': 0.1524, 'learning_rate': 1.9397e-06, 'epoch': 2.65, 'throughput': 2561.78}

[INFO|callbacks.py:310] 2024-07-24 21:14:51,826 >> {'loss': 0.1494, 'learning_rate': 1.2712e-06, 'epoch': 2.72, 'throughput': 2559.88}

[INFO|callbacks.py:310] 2024-07-24 21:23:35,877 >> {'loss': 0.1515, 'learning_rate': 7.4056e-07, 'epoch': 2.78, 'throughput': 2559.75}

[INFO|callbacks.py:310] 2024-07-24 21:32:00,699 >> {'loss': 0.1504, 'learning_rate': 3.5095e-07, 'epoch': 2.85, 'throughput': 2559.97}

[INFO|callbacks.py:310] 2024-07-24 21:40:43,444 >> {'loss': 0.1498, 'learning_rate': 1.0459e-07, 'epoch': 2.91, 'throughput': 2559.45}

[INFO|callbacks.py:310] 2024-07-24 21:48:52,834 >> {'loss': 0.1469, 'learning_rate': 2.9071e-09, 'epoch': 2.98, 'throughput': 2559.91}

[INFO|trainer.py:3503] 2024-07-24 21:50:38,895 >> Saving model checkpoint to saves/Custom/lora/train_2024-07-24-15-00-21/checkpoint-462

[INFO|configuration_utils.py:733] 2024-07-24 21:50:39,083 >> loading configuration file config.json from cache at /workspace/data/huggingface-cache/hub/models--microsoft--Phi-3-medium-128k-instruct/snapshots/cae1d42b5577398fd1be9f0746052562ae552886/config.json

[INFO|configuration_utils.py:800] 2024-07-24 21:50:39,084 >> Model config Phi3Config {
  "_name_or_path": "Phi-3-medium-128k-instruct",
  "architectures": [
    "Phi3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "microsoft/Phi-3-medium-128k-instruct--configuration_phi3.Phi3Config",
    "AutoModelForCausalLM": "microsoft/Phi-3-medium-128k-instruct--modeling_phi3.Phi3ForCausalLM"
  },
  "bos_token_id": 1,
  "embd_pdrop": 0.0,
  "eos_token_id": 32000,
  "hidden_act": "silu",
  "hidden_size": 5120,
  "initializer_range": 0.02,
  "intermediate_size": 17920,
  "max_position_embeddings": 131072,
  "model_type": "phi3",
  "num_attention_heads": 40,
  "num_hidden_layers": 40,
  "num_key_value_heads": 10,
  "original_max_position_embeddings": 4096,
  "pad_token_id": null,
  "resid_pdrop": 0.0,
  "rms_norm_eps": 1e-05,
  "rope_scaling": {
    "long_factor": [
      1.0,
      1.0,
      1.0,
      1.0,
      1.0,
      1.0,
      1.0,
      1.0,
      1.0,
      1.0,
      1.0,
      1.0,
      1.0,
      1.25,
      1.25,
      1.5,
      2.0,
      2.75,
      5.75,
      5.75,
      6.5,
      9.25,
      11.0,
      13.25,
      19.25,
      19.75,
      19.75,
      21.25,
      21.5,
      26.5,
      30.0,
      33.75,
      35.25,
      38.5,
      42.0,
      42.25,
      46.0,
      47.0,
      50.0,
      50.5,
      51.0,
      52.0,
      52.75,
      53.75,
      54.75,
      57.0,
      57.25,
      58.5,
      59.25,
      59.5,
      62.0,
      62.5,
      62.75,
      63.25,
      63.25,
      63.25,
      63.75,
      64.0,
      64.0,
      64.25,
      64.5,
      64.5,
      65.0,
      65.0
    ],
    "short_factor": [
      1.0,
      1.0,
      1.0,
      1.0,
      1.0,
      1.0,
      1.01,
      1.02,
      1.02,
      1.04,
      1.04,
      1.07,
      1.07,
      1.1,
      1.3000000000000003,
      1.3000000000000003,
      1.5000000000000004,
      1.5700000000000005,
      1.9000000000000008,
      2.3100000000000014,
      2.759999999999992,
      3.3899999999999784,
      3.9399999999999666,
      4.009999999999965,
      4.289999999999959,
      4.349999999999958,
      5.349999999999937,
      6.659999999999909,
      7.029999999999901,
      7.51999999999989,
      8.00999999999988,
      8.249999999999876,
      8.279999999999875,
      9.629999999999846,
      9.89999999999984,
      10.589999999999826,
      11.049999999999816,
      11.7899999999998,
      12.189999999999792,
      12.889999999999777,
      13.129999999999772,
      13.16999999999977,
      13.20999999999977,
      13.479999999999764,
      13.539999999999763,
      13.779999999999758,
      13.929999999999755,
      14.429999999999744,
      14.759999999999737,
      15.149999999999729,
      15.419999999999723,
      15.53999999999972,
      15.659999999999718,
      15.749999999999716,
      15.759999999999716,
      15.799999999999715,
      16.05999999999971,
      16.079999999999714,
      16.11999999999972,
      16.11999999999972,
      16.18999999999973,
      16.31999999999975,
      16.539999999999786,
      16.799999999999827
    ],
    "type": "su"
  },
  "rope_theta": 10000.0,
  "sliding_window": 131072,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.43.1",
  "use_cache": true,
  "vocab_size": 32064
}


[INFO|tokenization_utils_base.py:2702] 2024-07-24 21:50:39,135 >> tokenizer config file saved in saves/Custom/lora/train_2024-07-24-15-00-21/checkpoint-462/tokenizer_config.json

[INFO|tokenization_utils_base.py:2711] 2024-07-24 21:50:39,135 >> Special tokens file saved in saves/Custom/lora/train_2024-07-24-15-00-21/checkpoint-462/special_tokens_map.json

[INFO|trainer.py:2394] 2024-07-24 21:50:39,852 >> 

Training completed. Do not forget to share your model on huggingface.co/models =)


[INFO|trainer.py:3503] 2024-07-24 21:50:42,012 >> Saving model checkpoint to saves/Custom/lora/train_2024-07-24-15-00-21

[INFO|configuration_utils.py:733] 2024-07-24 21:50:42,204 >> loading configuration file config.json from cache at /workspace/data/huggingface-cache/hub/models--microsoft--Phi-3-medium-128k-instruct/snapshots/cae1d42b5577398fd1be9f0746052562ae552886/config.json

[INFO|configuration_utils.py:800] 2024-07-24 21:50:42,205 >> Model config Phi3Config {
  "_name_or_path": "Phi-3-medium-128k-instruct",
  "architectures": [
    "Phi3ForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "auto_map": {
    "AutoConfig": "microsoft/Phi-3-medium-128k-instruct--configuration_phi3.Phi3Config",
    "AutoModelForCausalLM": "microsoft/Phi-3-medium-128k-instruct--modeling_phi3.Phi3ForCausalLM"
  },
  "bos_token_id": 1,
  "embd_pdrop": 0.0,
  "eos_token_id": 32000,
  "hidden_act": "silu",
  "hidden_size": 5120,
  "initializer_range": 0.02,
  "intermediate_size": 17920,
  "max_position_embeddings": 131072,
  "model_type": "phi3",
  "num_attention_heads": 40,
  "num_hidden_layers": 40,
  "num_key_value_heads": 10,
  "original_max_position_embeddings": 4096,
  "pad_token_id": null,
  "resid_pdrop": 0.0,
  "rms_norm_eps": 1e-05,
  "rope_scaling": {
    "long_factor": [
      1.0,
      1.0,
      1.0,
      1.0,
      1.0,
      1.0,
      1.0,
      1.0,
      1.0,
      1.0,
      1.0,
      1.0,
      1.0,
      1.25,
      1.25,
      1.5,
      2.0,
      2.75,
      5.75,
      5.75,
      6.5,
      9.25,
      11.0,
      13.25,
      19.25,
      19.75,
      19.75,
      21.25,
      21.5,
      26.5,
      30.0,
      33.75,
      35.25,
      38.5,
      42.0,
      42.25,
      46.0,
      47.0,
      50.0,
      50.5,
      51.0,
      52.0,
      52.75,
      53.75,
      54.75,
      57.0,
      57.25,
      58.5,
      59.25,
      59.5,
      62.0,
      62.5,
      62.75,
      63.25,
      63.25,
      63.25,
      63.75,
      64.0,
      64.0,
      64.25,
      64.5,
      64.5,
      65.0,
      65.0
    ],
    "short_factor": [
      1.0,
      1.0,
      1.0,
      1.0,
      1.0,
      1.0,
      1.01,
      1.02,
      1.02,
      1.04,
      1.04,
      1.07,
      1.07,
      1.1,
      1.3000000000000003,
      1.3000000000000003,
      1.5000000000000004,
      1.5700000000000005,
      1.9000000000000008,
      2.3100000000000014,
      2.759999999999992,
      3.3899999999999784,
      3.9399999999999666,
      4.009999999999965,
      4.289999999999959,
      4.349999999999958,
      5.349999999999937,
      6.659999999999909,
      7.029999999999901,
      7.51999999999989,
      8.00999999999988,
      8.249999999999876,
      8.279999999999875,
      9.629999999999846,
      9.89999999999984,
      10.589999999999826,
      11.049999999999816,
      11.7899999999998,
      12.189999999999792,
      12.889999999999777,
      13.129999999999772,
      13.16999999999977,
      13.20999999999977,
      13.479999999999764,
      13.539999999999763,
      13.779999999999758,
      13.929999999999755,
      14.429999999999744,
      14.759999999999737,
      15.149999999999729,
      15.419999999999723,
      15.53999999999972,
      15.659999999999718,
      15.749999999999716,
      15.759999999999716,
      15.799999999999715,
      16.05999999999971,
      16.079999999999714,
      16.11999999999972,
      16.11999999999972,
      16.18999999999973,
      16.31999999999975,
      16.539999999999786,
      16.799999999999827
    ],
    "type": "su"
  },
  "rope_theta": 10000.0,
  "sliding_window": 131072,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.43.1",
  "use_cache": true,
  "vocab_size": 32064
}


[INFO|tokenization_utils_base.py:2702] 2024-07-24 21:50:42,263 >> tokenizer config file saved in saves/Custom/lora/train_2024-07-24-15-00-21/tokenizer_config.json

[INFO|tokenization_utils_base.py:2711] 2024-07-24 21:50:42,264 >> Special tokens file saved in saves/Custom/lora/train_2024-07-24-15-00-21/special_tokens_map.json

[WARNING|ploting.py:89] 2024-07-24 21:50:42,678 >> No metric eval_loss to plot.

[WARNING|ploting.py:89] 2024-07-24 21:50:42,678 >> No metric eval_accuracy to plot.

[INFO|modelcard.py:449] 2024-07-24 21:50:42,679 >> Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}