eli4s
/

Bert-L12-h256-A4

Inference Endpoints

Model card Files Files and versions Community

eli4s commited on Aug 17, 2021

Commit

094a1a2

•

1 Parent(s): fe8ff71

Create README.md

Files changed (1) hide show

README.md +50 -0

README.md ADDED Viewed

	@@ -0,0 +1,50 @@

+This model was pretrained on the bookcorpus dataset using knowledge distillation.
+The particularity of this model is that even though it shares the same architecture as BERT, it has a hidden size of 240. Since it has 12 attention heads, the head size (20) is different from the one of the BERT base model (64).
+The knowledge distillation was performed using multiple loss functions.
+The weights of the model were initialized from scratch.
+PS : the tokenizer is the same as the one of the model bert-base-uncased.
+To load the model \& tokenizer :
+````python
+from transformers import AutoModelForMaskedLM, BertTokenizer
+model_name = "eli4s/Bert-L12-h256-A4"
+model = AutoModelForMaskedLM.from_pretrained(model_name)
+tokenizer = BertTokenizer.from_pretrained(model_name)
+````
+To use it as a masked language model :
+````python
+import torch
+sentence = "Let's have a [MASK]."
+model.eval()
+inputs = tokenizer([sentence], padding='longest', return_tensors='pt')
+output = model(inputs['input_ids'], attention_mask=inputs['attention_mask'])
+mask_index = inputs['input_ids'].tolist()[0].index(103)
+masked_token = output['logits'][0][mask_index].argmax(axis=-1)
+predicted_token = tokenizer.decode(masked_token)
+print(predicted_token)
+````
+Or we can also predict the n most relevant predictions :
+````python
+top_n = 5
+vocab_size = model.config.vocab_size
+logits = output['logits'][0][mask_index].tolist()
+top_tokens = sorted(list(range(vocab_size)), key=lambda  i:logits[i], reverse=True)[:top_n]
+tokenizer.decode(top_tokens)
+````