Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,50 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
This model was pretrained on the bookcorpus dataset using knowledge distillation.
|
2 |
+
|
3 |
+
The particularity of this model is that even though it shares the same architecture as BERT, it has a hidden size of 240. Since it has 12 attention heads, the head size (20) is different from the one of the BERT base model (64).
|
4 |
+
|
5 |
+
The knowledge distillation was performed using multiple loss functions.
|
6 |
+
|
7 |
+
The weights of the model were initialized from scratch.
|
8 |
+
|
9 |
+
PS : the tokenizer is the same as the one of the model bert-base-uncased.
|
10 |
+
|
11 |
+
|
12 |
+
To load the model \& tokenizer :
|
13 |
+
|
14 |
+
````python
|
15 |
+
from transformers import AutoModelForMaskedLM, BertTokenizer
|
16 |
+
|
17 |
+
model_name = "eli4s/Bert-L12-h256-A4"
|
18 |
+
model = AutoModelForMaskedLM.from_pretrained(model_name)
|
19 |
+
tokenizer = BertTokenizer.from_pretrained(model_name)
|
20 |
+
````
|
21 |
+
|
22 |
+
To use it as a masked language model :
|
23 |
+
|
24 |
+
````python
|
25 |
+
import torch
|
26 |
+
|
27 |
+
sentence = "Let's have a [MASK]."
|
28 |
+
|
29 |
+
model.eval()
|
30 |
+
inputs = tokenizer([sentence], padding='longest', return_tensors='pt')
|
31 |
+
output = model(inputs['input_ids'], attention_mask=inputs['attention_mask'])
|
32 |
+
|
33 |
+
mask_index = inputs['input_ids'].tolist()[0].index(103)
|
34 |
+
masked_token = output['logits'][0][mask_index].argmax(axis=-1)
|
35 |
+
predicted_token = tokenizer.decode(masked_token)
|
36 |
+
|
37 |
+
print(predicted_token)
|
38 |
+
````
|
39 |
+
|
40 |
+
Or we can also predict the n most relevant predictions :
|
41 |
+
|
42 |
+
````python
|
43 |
+
top_n = 5
|
44 |
+
|
45 |
+
vocab_size = model.config.vocab_size
|
46 |
+
logits = output['logits'][0][mask_index].tolist()
|
47 |
+
top_tokens = sorted(list(range(vocab_size)), key=lambda i:logits[i], reverse=True)[:top_n]
|
48 |
+
|
49 |
+
tokenizer.decode(top_tokens)
|
50 |
+
````
|