ryo0634 commited on
Commit
a3bacf3
1 Parent(s): cc5e9e8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -0
README.md CHANGED
@@ -45,6 +45,21 @@ This is the mLUKE base model with 12 hidden layers, 768 hidden size. The total n
45
  of parameters in this model is 585M (278M for the word embeddings and encoder, 307M for the entity embeddings).
46
  The model was initialized with the weights of XLM-RoBERTa(base) and trained using December 2020 version of Wikipedia in 24 languages.
47
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48
  ### Citation
49
 
50
  If you find mLUKE useful for your work, please cite the following paper:
 
45
  of parameters in this model is 585M (278M for the word embeddings and encoder, 307M for the entity embeddings).
46
  The model was initialized with the weights of XLM-RoBERTa(base) and trained using December 2020 version of Wikipedia in 24 languages.
47
 
48
+
49
+ ## Note
50
+ When you load the model from `AutoModel.from_pretrained` with the default configuration, you will see the following warning:
51
+
52
+ ```
53
+ Some weights of the model checkpoint at studio-ousia/mluke-base-lite were not used when initializing LukeModel: [
54
+ 'luke.encoder.layer.0.attention.self.w2e_query.weight', 'luke.encoder.layer.0.attention.self.w2e_query.bias',
55
+ 'luke.encoder.layer.0.attention.self.e2w_query.weight', 'luke.encoder.layer.0.attention.self.e2w_query.bias',
56
+ 'luke.encoder.layer.0.attention.self.e2e_query.weight', 'luke.encoder.layer.0.attention.self.e2e_query.bias',
57
+ ...]
58
+ ```
59
+
60
+ These weights are the weights for entity-aware attention (as described in [the LUKE paper](https://arxiv.org/abs/2010.01057)).
61
+ This is expected because `use_entity_aware_attention` is set to `false` by default, but the pretrained weights contain the weights for it in case you enable `use_entity_aware_attention` and have the weights loaded into the model.
62
+
63
  ### Citation
64
 
65
  If you find mLUKE useful for your work, please cite the following paper: