File size: 1,829 Bytes
3423c66
08f790b
 
3423c66
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
This is a masked language model that was trained on IMDB dataset using a finetuned DistilBERT model. 
### Note: 
I published a tutorial explaining how transformers work and how to train a masked language model using transformer. https://olafenwaayoola.medium.com/the-concept-of-transformers-and-training-a-transformers-model-45a09ae7fb50

# Rest API Code for Testing the Masked Language Model

Inference API python code for testing the masked language model.
``` python
import requests

API_URL = "https://api-inference.huggingface.co/models/ayoolaolafenwa/Masked-Language-Model"
headers = {"Authorization": "Bearer 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"}

def query(payload):
	response = requests.post(API_URL, headers=headers, json=payload)
	return response.json()
	
output = query({
	"inputs": "Washington DC is the [MASK] of USA.",
})
print(output[0]["sequence"])
```

Output
```
washington dc is the capital of usa.
```
It produces the correct output, *washington dc is the capital of usa.*

## Load the Masked Language Model with Transformers
You can easily load the Language model with transformers using this code.
``` python
from transformers import AutoTokenizer, AutoModelForMaskedLM
import torch

tokenizer = AutoTokenizer.from_pretrained("ayoolaolafenwa/Masked-Language-Model")

model = AutoModelForMaskedLM.from_pretrained("ayoolaolafenwa/Masked-Language-Model") 

inputs = tokenizer("The internet [MASK] amazing.", return_tensors="pt")


with torch.no_grad():
    logits = model(**inputs).logits

# retrieve index of [MASK]
mask_token_index = (inputs.input_ids == tokenizer.mask_token_id)[0].nonzero(as_tuple=True)[0]

predicted_token_id = logits[0, mask_token_index].argmax(axis=-1)
output = tokenizer.decode(predicted_token_id)
print(output)
```
Output
```
is 
```

It prints out the predicted masked word *is*.