Some weights of the model checkpoint at ./model_dir were not used when initializing BertModel

#30
by ericxian1997 - opened

Some weights of the model checkpoint at ./model_dir were not used when initializing BertModel: ['encoder.layer.2.mlp.wo.bias', 'encoder.layer.11.mlp.wo.weight', 'encoder.layer.0.mlp.layernorm.weight', 'encoder.layer.5.mlp.gated_layers.weight', 'encoder.layer.7.mlp.wo.weight', 'encoder.layer.8.mlp.wo.weight', 'encoder.layer.3.mlp.wo.weight', 'encoder.layer.1.mlp.layernorm.bias', 'encoder.layer.8.mlp.layernorm.weight', 'encoder.layer.3.mlp.layernorm.weight', 'encoder.layer.5.mlp.wo.bias', 'encoder.layer.7.mlp.wo.bias', 'encoder.layer.9.mlp.layernorm.bias', 'encoder.layer.10.mlp.wo.weight', 'encoder.layer.11.mlp.layernorm.weight', 'encoder.layer.0.mlp.wo.weight', 'encoder.layer.8.mlp.wo.bias', 'encoder.layer.7.mlp.gated_layers.weight', 'encoder.layer.0.mlp.layernorm.bias', 'encoder.layer.11.mlp.gated_layers.weight', 'encoder.layer.3.mlp.wo.bias', 'encoder.layer.4.mlp.gated_layers.weight', 'encoder.layer.2.mlp.layernorm.bias', 'encoder.layer.9.mlp.wo.bias', 'encoder.layer.5.mlp.layernorm.weight', 'encoder.layer.10.mlp.layernorm.weight', 'encoder.layer.6.mlp.layernorm.bias', 'encoder.layer.2.mlp.gated_layers.weight', 'encoder.layer.4.mlp.layernorm.weight', 'encoder.layer.6.mlp.wo.bias', 'encoder.layer.7.mlp.layernorm.bias', 'encoder.layer.10.mlp.layernorm.bias', 'encoder.layer.0.mlp.gated_layers.weight', 'encoder.layer.4.mlp.wo.bias', 'encoder.layer.6.mlp.layernorm.weight', 'encoder.layer.2.mlp.wo.weight', 'encoder.layer.3.mlp.gated_layers.weight', 'encoder.layer.9.mlp.wo.weight', 'encoder.layer.7.mlp.layernorm.weight', 'encoder.layer.0.mlp.wo.bias', 'encoder.layer.10.mlp.gated_layers.weight', 'encoder.layer.4.mlp.layernorm.bias', 'encoder.layer.11.mlp.wo.bias', 'encoder.layer.8.mlp.layernorm.bias', 'encoder.layer.3.mlp.layernorm.bias', 'encoder.layer.1.mlp.gated_layers.weight', 'encoder.layer.5.mlp.layernorm.bias', 'encoder.layer.4.mlp.wo.weight', 'encoder.layer.1.mlp.wo.bias', 'encoder.layer.1.mlp.layernorm.weight', 'encoder.layer.2.mlp.layernorm.weight', 'encoder.layer.8.mlp.gated_layers.weight', 'encoder.layer.5.mlp.wo.weight', 'encoder.layer.10.mlp.wo.bias', 'encoder.layer.9.mlp.layernorm.weight', 'encoder.layer.11.mlp.layernorm.bias', 'encoder.layer.1.mlp.wo.weight', 'encoder.layer.6.mlp.wo.weight', 'encoder.layer.9.mlp.gated_layers.weight', 'encoder.layer.6.mlp.gated_layers.weight']

  • This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    Some weights of BertModel were not initialized from the model checkpoint at ./model_dir and are newly initialized: ['encoder.layer.7.intermediate.dense.weight', 'encoder.layer.10.intermediate.dense.weight', 'encoder.layer.11.output.dense.weight', 'encoder.layer.8.output.dense.weight', 'encoder.layer.9.output.LayerNorm.weight', 'encoder.layer.9.intermediate.dense.bias', 'encoder.layer.3.intermediate.dense.weight', 'encoder.layer.8.intermediate.dense.weight', 'encoder.layer.6.output.dense.bias', 'encoder.layer.1.output.dense.bias', 'encoder.layer.0.intermediate.dense.bias', 'encoder.layer.8.intermediate.dense.bias', 'encoder.layer.1.output.LayerNorm.weight', 'encoder.layer.5.intermediate.dense.bias', 'encoder.layer.5.output.dense.bias', 'encoder.layer.1.intermediate.dense.weight', 'encoder.layer.4.intermediate.dense.bias', 'encoder.layer.2.output.LayerNorm.bias', 'encoder.layer.7.output.LayerNorm.weight', 'encoder.layer.11.output.dense.bias', 'encoder.layer.7.intermediate.dense.bias', 'encoder.layer.1.intermediate.dense.bias', 'encoder.layer.7.output.dense.weight', 'encoder.layer.8.output.LayerNorm.weight', 'encoder.layer.8.output.LayerNorm.bias', 'encoder.layer.8.output.dense.bias', 'encoder.layer.11.output.LayerNorm.bias', 'encoder.layer.3.output.dense.bias', 'encoder.layer.9.output.LayerNorm.bias', 'encoder.layer.2.intermediate.dense.weight', 'encoder.layer.11.output.LayerNorm.weight', 'encoder.layer.4.output.dense.bias', 'encoder.layer.1.output.LayerNorm.bias', 'encoder.layer.9.output.dense.weight', 'encoder.layer.6.intermediate.dense.bias', 'encoder.layer.1.output.dense.weight', 'encoder.layer.3.output.LayerNorm.bias', 'encoder.layer.2.output.dense.bias', 'encoder.layer.4.intermediate.dense.weight', 'encoder.layer.0.output.dense.bias', 'encoder.layer.4.output.dense.weight', 'encoder.layer.5.output.dense.weight', 'embeddings.position_embeddings.weight', 'encoder.layer.5.output.LayerNorm.weight', 'encoder.layer.2.intermediate.dense.bias', 'encoder.layer.3.output.LayerNorm.weight', 'encoder.layer.6.output.LayerNorm.weight', 'encoder.layer.0.output.LayerNorm.bias', 'encoder.layer.11.intermediate.dense.weight', 'encoder.layer.10.output.dense.weight', 'encoder.layer.4.output.LayerNorm.weight', 'encoder.layer.0.output.LayerNorm.weight', 'encoder.layer.0.output.dense.weight', 'encoder.layer.5.output.LayerNorm.bias', 'encoder.layer.9.intermediate.dense.weight', 'encoder.layer.3.intermediate.dense.bias', 'encoder.layer.5.intermediate.dense.weight', 'encoder.layer.4.output.LayerNorm.bias', 'encoder.layer.10.intermediate.dense.bias', 'encoder.layer.7.output.dense.bias', 'encoder.layer.9.output.dense.bias', 'encoder.layer.2.output.LayerNorm.weight', 'encoder.layer.2.output.dense.weight', 'encoder.layer.6.output.dense.weight', 'encoder.layer.10.output.LayerNorm.weight', 'encoder.layer.6.intermediate.dense.weight', 'encoder.layer.0.intermediate.dense.weight', 'encoder.layer.10.output.dense.bias', 'encoder.layer.11.intermediate.dense.bias', 'encoder.layer.10.output.LayerNorm.bias', 'encoder.layer.6.output.LayerNorm.bias', 'encoder.layer.7.output.LayerNorm.bias', 'encoder.layer.3.output.dense.weight']
    You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

where u able to solve it ?

This usually happens if trust_remote_code=True is missing when calling AutoModel.from_pretrained. If this does not solve your problem, can you share the code and the version of the transformers package, which you were using to load the model?

bwang0911 changed discussion status to closed

Sign up or log in to comment