microsoft/phi-1_5 · RuntimeError: Error(s) in loading state

Hi, I recently fine-tune phi-1.5. However, I wasn't able to load its checkpoint. It seem like the config of MixFormerSequentialForCausalLm has been modified.
Details log:
WARNING:torch.distributed.run:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
[2023-09-13 13:10:40,735] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2023-09-13 13:10:41,210] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
loading configuration file /datadrive05/dungnm31/Exp/phi15/checkpoint-3450/config.json
loading configuration file /datadrive05/dungnm31/Exp/phi15/checkpoint-3450/config.json
Model config MixFormerSequentialConfig {
  "_name_or_path": "/datadrive05/dungnm31/Exp/phi15/checkpoint-3450/",
  "activation_function": "gelu_new",
  "architecture": {
    "block_cls": "parallel",
    "mixer": {},
    "mlp": {
      "mlp_cls": "mlp"
    }
  },
  "architectures": [
    "MixFormerSequentialForCausalLM"
  ],
  "auto_map": {
    "AutoConfig": "microsoft/phi-1_5--configuration_mixformer_sequential.MixFormerSequentialConfig",
    "AutoModelForCausalLM": "microsoft/phi-1_5--modeling_mixformer_sequential.MixFormerSequentialForCausalLM"
  },
  "embd_layer": "default",
  "embd_pdrop": 0.0,
  "initializer_range": 0.02,
  "layer_norm_epsilon": 1e-05,
  "model_type": "mixformer-sequential",
  "n_embd": 2048,
  "n_head": 32,
  "n_inner": null,
  "n_layer": 24,
  "n_positions": 2048,
  "phyagi_version": "0.0.4.dev",
  "resid_pdrop": 0.0,
  "rotary_dim": 32,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.34.0.dev0",
  "vocab_size": 50304
}

loading file vocab.json
loading file merges.txt
loading file tokenizer.json
loading file added_tokens.json
loading file special_tokens_map.json
loading file tokenizer_config.json
loading weights file /datadrive05/dungnm31/Exp/phi15/checkpoint-3450/pytorch_model.bin
Generate config GenerationConfig {
  "_from_model_config": true,
  "transformers_version": "4.34.0.dev0"
}

Traceback (most recent call last):
  File "/datadrive05/dungnm31/inst/main.py", line 150, in <module>
    main()
  File "/datadrive05/dungnm31/inst/main.py", line 91, in main
    model = AutoModelForCausalLM.from_pretrained(model_args.model_name_or_path,
  File "/home/dungnm31/.local/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py", line 558, in from_pretrained
    return model_class.from_pretrained(
  File "/home/dungnm31/.local/lib/python3.9/site-packages/transformers/modeling_utils.py", line 3180, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/home/dungnm31/.local/lib/python3.9/site-packages/transformers/modeling_utils.py", line 3629, in _load_pretrained_model
    raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}")
RuntimeError: Error(s) in loading state_dict for MixFormerSequentialForCausalLM:
        size mismatch for layers.0.wte.weight: copying a param with shape torch.Size([50296, 2048]) from checkpoint, the shape in current model is torch.Size([50304, 2048]).
        size mismatch for layers.25.linear.weight: copying a param with shape torch.Size([50296, 2048]) from checkpoint, the shape in current model is torch.Size([50304, 2048]).
        size mismatch for layers.25.linear.bias: copying a param with shape torch.Size([50296]) from checkpoint, the shape in current model is torch.Size([50304]).
        You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method.
microsoft
/

phi-1_5

RuntimeError: Error(s) in loading state_dict for MixFormerSequentialForCausalLm