studio-ousia
/

mluke-large

named entity recognition

relation classification

question answering

Inference Endpoints

Model card Files Files and versions Community

ryo0634 commited on Jun 16, 2023

Commit

07facce

•

1 Parent(s): 82423db

Update README.md

Files changed (1) hide show

README.md +15 -0

README.md CHANGED Viewed

@@ -45,6 +45,21 @@ This is the mLUKE large model with 24 hidden layers, 768 hidden size. The total
 of parameters in this model is 868M (561M for the word embeddings and encoder, 307M for the entity embeddings).
 The model was initialized with the weights of XLM-RoBERTa(large) and trained using December 2020 version of Wikipedia in 24 languages.
 ### Citation
 If you find mLUKE useful for your work, please cite the following paper:

 of parameters in this model is 868M (561M for the word embeddings and encoder, 307M for the entity embeddings).
 The model was initialized with the weights of XLM-RoBERTa(large) and trained using December 2020 version of Wikipedia in 24 languages.
+## Note
+When you load the model from `AutoModel.from_pretrained` with the default configuration, you will see the following warning:
+```
+Some weights of the model checkpoint at studio-ousia/mluke-base-lite were not used when initializing LukeModel: [
+'luke.encoder.layer.0.attention.self.w2e_query.weight', 'luke.encoder.layer.0.attention.self.w2e_query.bias',
+'luke.encoder.layer.0.attention.self.e2w_query.weight', 'luke.encoder.layer.0.attention.self.e2w_query.bias',
+'luke.encoder.layer.0.attention.self.e2e_query.weight', 'luke.encoder.layer.0.attention.self.e2e_query.bias',
+...]
+```
+These weights are the weights for entity-aware attention (as described in [the LUKE paper](https://arxiv.org/abs/2010.01057)).
+This is expected because `use_entity_aware_attention` is set to `false` by default, but the pretrained weights contain the weights for it in case you enable `use_entity_aware_attention` and have the weights loaded into the model.
 ### Citation
 If you find mLUKE useful for your work, please cite the following paper: