eli4s commited on
Commit
094a1a2
1 Parent(s): fe8ff71

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -0
README.md ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ This model was pretrained on the bookcorpus dataset using knowledge distillation.
2
+
3
+ The particularity of this model is that even though it shares the same architecture as BERT, it has a hidden size of 240. Since it has 12 attention heads, the head size (20) is different from the one of the BERT base model (64).
4
+
5
+ The knowledge distillation was performed using multiple loss functions.
6
+
7
+ The weights of the model were initialized from scratch.
8
+
9
+ PS : the tokenizer is the same as the one of the model bert-base-uncased.
10
+
11
+
12
+ To load the model \& tokenizer :
13
+
14
+ ````python
15
+ from transformers import AutoModelForMaskedLM, BertTokenizer
16
+
17
+ model_name = "eli4s/Bert-L12-h256-A4"
18
+ model = AutoModelForMaskedLM.from_pretrained(model_name)
19
+ tokenizer = BertTokenizer.from_pretrained(model_name)
20
+ ````
21
+
22
+ To use it as a masked language model :
23
+
24
+ ````python
25
+ import torch
26
+
27
+ sentence = "Let's have a [MASK]."
28
+
29
+ model.eval()
30
+ inputs = tokenizer([sentence], padding='longest', return_tensors='pt')
31
+ output = model(inputs['input_ids'], attention_mask=inputs['attention_mask'])
32
+
33
+ mask_index = inputs['input_ids'].tolist()[0].index(103)
34
+ masked_token = output['logits'][0][mask_index].argmax(axis=-1)
35
+ predicted_token = tokenizer.decode(masked_token)
36
+
37
+ print(predicted_token)
38
+ ````
39
+
40
+ Or we can also predict the n most relevant predictions :
41
+
42
+ ````python
43
+ top_n = 5
44
+
45
+ vocab_size = model.config.vocab_size
46
+ logits = output['logits'][0][mask_index].tolist()
47
+ top_tokens = sorted(list(range(vocab_size)), key=lambda i:logits[i], reverse=True)[:top_n]
48
+
49
+ tokenizer.decode(top_tokens)
50
+ ````