configs.json error
#1
by
RaccoonOnion
- opened
In the configs.json of this model, why shows inconsistent structure?
"architectures": [
"MistralForCausalLM"
],
And
"model_type": "mistral",
When I load the model through .from_pretrained, I got errors:
You are using a model of type mistral to instantiate a model of type phi3. This is not supported for all configurations of models and can yield errors.
and
Some weights of Phi3ForSFT were not initialized from the model checkpoint at unsloth/Phi-3-medium-4k-instruct-bnb-4bit and are newly initialized: ['model.layers.0.mlp.gate_up_proj.weight', 'model.layers.0.self_attn.qkv_proj.weight', 'model.layers.1.mlp.gate_up_proj.weight', 'model.layers.1.self_attn.qkv_proj.weight', 'model.layers.10.mlp.gate_up_proj.weight', 'model.layers.10.self_attn.qkv_proj.weight', 'model.layers.11.mlp.gate_up_proj.weight', 'model.layers.11.self_attn.qkv_proj.weight', 'model.layers.12.mlp.gate_up_proj.weight', 'model.layers.12.self_attn.qkv_proj.weight', 'model.layers.13.mlp.gate_up_proj.weight', 'model.layers.13.self_attn.qkv_proj.weight', 'model.layers.14.mlp.gate_up_proj.weight', 'model.layers.14.self_attn.qkv_proj.weight', 'model.layers.15.mlp.gate_up_proj.weight', 'model.layers.15.self_attn.qkv_proj.weight', 'model.layers.16.mlp.gate_up_proj.weight', 'model.layers.16.self_attn.qkv_proj.weight', 'model.layers.17.mlp.gate_up_proj.weight', 'model.layers.17.self_attn.qkv_proj.weight', 'model.layers.18.mlp.gate_up_proj.weight', 'model.layers.18.self_attn.qkv_proj.weight', 'model.layers.19.mlp.gate_up_proj.weight', 'model.layers.19.self_attn.qkv_proj.weight', 'model.layers.2.mlp.gate_up_proj.weight', 'model.layers.2.self_attn.qkv_proj.weight', 'model.layers.20.mlp.gate_up_proj.weight', 'model.layers.20.self_attn.qkv_proj.weight', 'model.layers.21.mlp.gate_up_proj.weight', 'model.layers.21.self_attn.qkv_proj.weight', 'model.layers.22.mlp.gate_up_proj.weight', 'model.layers.22.self_attn.qkv_proj.weight', 'model.layers.23.mlp.gate_up_proj.weight', 'model.layers.23.self_attn.qkv_proj.weight', 'model.layers.24.mlp.gate_up_proj.weight', 'model.layers.24.self_attn.qkv_proj.weight', 'model.layers.25.mlp.gate_up_proj.weight', 'model.layers.25.self_attn.qkv_proj.weight', 'model.layers.26.mlp.gate_up_proj.weight', 'model.layers.26.self_attn.qkv_proj.weight', 'model.layers.27.mlp.gate_up_proj.weight', 'model.layers.27.self_attn.qkv_proj.weight', 'model.layers.28.mlp.gate_up_proj.weight', 'model.layers.28.self_attn.qkv_proj.weight', 'model.layers.29.mlp.gate_up_proj.weight', 'model.layers.29.self_attn.qkv_proj.weight', 'model.layers.3.mlp.gate_up_proj.weight', 'model.layers.3.self_attn.qkv_proj.weight', 'model.layers.30.mlp.gate_up_proj.weight', 'model.layers.30.self_attn.qkv_proj.weight', 'model.layers.31.mlp.gate_up_proj.weight', 'model.layers.31.self_attn.qkv_proj.weight', 'model.layers.32.mlp.gate_up_proj.weight', 'model.layers.32.self_attn.qkv_proj.weight', 'model.layers.33.mlp.gate_up_proj.weight', 'model.layers.33.self_attn.qkv_proj.weight', 'model.layers.34.mlp.gate_up_proj.weight', 'model.layers.34.self_attn.qkv_proj.weight', 'model.layers.35.mlp.gate_up_proj.weight', 'model.layers.35.self_attn.qkv_proj.weight', 'model.layers.36.mlp.gate_up_proj.weight', 'model.layers.36.self_attn.qkv_proj.weight', 'model.layers.37.mlp.gate_up_proj.weight', 'model.layers.37.self_attn.qkv_proj.weight', 'model.layers.38.mlp.gate_up_proj.weight', 'model.layers.38.self_attn.qkv_proj.weight', 'model.layers.39.mlp.gate_up_proj.weight', 'model.layers.39.self_attn.qkv_proj.weight', 'model.layers.4.mlp.gate_up_proj.weight', 'model.layers.4.self_attn.qkv_proj.weight', 'model.layers.5.mlp.gate_up_proj.weight', 'model.layers.5.self_attn.qkv_proj.weight', 'model.layers.6.mlp.gate_up_proj.weight', 'model.layers.6.self_attn.qkv_proj.weight', 'model.layers.7.mlp.gate_up_proj.weight', 'model.layers.7.self_attn.qkv_proj.weight', 'model.layers.8.mlp.gate_up_proj.weight', 'model.layers.8.self_attn.qkv_proj.weight', 'model.layers.9.mlp.gate_up_proj.weight', 'model.layers.9.self_attn.qkv_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Note that Phi3ForSFT is just a wrapper over Phi3Model. Any chance wrong models are uploaded to this repo?
Saw issues from another phi3 repo: unsloth "mistralize" the model. Would be helpful if you can put a notice on page for future users not to confuse.
RaccoonOnion
changed discussion status to
closed