Ans commited on
Commit
85ca6f3
2 Parent(s): 644b9df f40165c

Merge branch 'main' of https://github.com/ansfarooq7/l4-project into main

Browse files
Files changed (1) hide show
  1. protoypes/L4_Project_first.ipynb +800 -0
protoypes/L4_Project_first.ipynb ADDED
@@ -0,0 +1,800 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "nbformat": 4,
3
+ "nbformat_minor": 0,
4
+ "metadata": {
5
+ "colab": {
6
+ "name": "L4 Project.ipynb",
7
+ "provenance": [],
8
+ "collapsed_sections": [],
9
+ "include_colab_link": true
10
+ },
11
+ "kernelspec": {
12
+ "name": "python3",
13
+ "display_name": "Python 3"
14
+ },
15
+ "language_info": {
16
+ "name": "python"
17
+ }
18
+ },
19
+ "cells": [
20
+ {
21
+ "cell_type": "markdown",
22
+ "metadata": {
23
+ "id": "view-in-github",
24
+ "colab_type": "text"
25
+ },
26
+ "source": [
27
+ "<a href=\"https://colab.research.google.com/github/ansfarooq7/l4-project/blob/main/protoypes/L4_Project_first.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
28
+ ]
29
+ },
30
+ {
31
+ "cell_type": "code",
32
+ "metadata": {
33
+ "id": "tGd7LV0Qtxwz"
34
+ },
35
+ "source": [
36
+ "# Code from https://ramsrigoutham.medium.com/sized-fill-in-the-blank-or-multi-mask-filling-with-roberta-and-huggingface-transformers-58eb9e7fb0c"
37
+ ],
38
+ "execution_count": 47,
39
+ "outputs": []
40
+ },
41
+ {
42
+ "cell_type": "code",
43
+ "metadata": {
44
+ "colab": {
45
+ "base_uri": "https://localhost:8080/"
46
+ },
47
+ "id": "KuvW3fmor6Tu",
48
+ "outputId": "0f032f97-e518-432d-cf2b-b52bc0e6b80d"
49
+ },
50
+ "source": [
51
+ "!pip install transformers\n",
52
+ "import torch"
53
+ ],
54
+ "execution_count": 48,
55
+ "outputs": [
56
+ {
57
+ "output_type": "stream",
58
+ "name": "stdout",
59
+ "text": [
60
+ "Requirement already satisfied: transformers in /usr/local/lib/python3.7/dist-packages (4.11.3)\n",
61
+ "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.7/dist-packages (from transformers) (21.0)\n",
62
+ "Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from transformers) (2.23.0)\n",
63
+ "Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.7/dist-packages (from transformers) (1.19.5)\n",
64
+ "Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.7/dist-packages (from transformers) (6.0)\n",
65
+ "Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.7/dist-packages (from transformers) (4.8.1)\n",
66
+ "Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.7/dist-packages (from transformers) (2019.12.20)\n",
67
+ "Requirement already satisfied: huggingface-hub>=0.0.17 in /usr/local/lib/python3.7/dist-packages (from transformers) (0.0.19)\n",
68
+ "Requirement already satisfied: filelock in /usr/local/lib/python3.7/dist-packages (from transformers) (3.3.0)\n",
69
+ "Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.7/dist-packages (from transformers) (4.62.3)\n",
70
+ "Requirement already satisfied: tokenizers<0.11,>=0.10.1 in /usr/local/lib/python3.7/dist-packages (from transformers) (0.10.3)\n",
71
+ "Requirement already satisfied: sacremoses in /usr/local/lib/python3.7/dist-packages (from transformers) (0.0.46)\n",
72
+ "Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from huggingface-hub>=0.0.17->transformers) (3.7.4.3)\n",
73
+ "Requirement already satisfied: pyparsing>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging>=20.0->transformers) (2.4.7)\n",
74
+ "Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata->transformers) (3.6.0)\n",
75
+ "Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (1.24.3)\n",
76
+ "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (2021.5.30)\n",
77
+ "Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (3.0.4)\n",
78
+ "Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (2.10)\n",
79
+ "Requirement already satisfied: joblib in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers) (1.0.1)\n",
80
+ "Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers) (1.15.0)\n",
81
+ "Requirement already satisfied: click in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers) (7.1.2)\n"
82
+ ]
83
+ }
84
+ ]
85
+ },
86
+ {
87
+ "cell_type": "code",
88
+ "metadata": {
89
+ "id": "AIRsE899r-53",
90
+ "colab": {
91
+ "base_uri": "https://localhost:8080/"
92
+ },
93
+ "outputId": "f1181c17-41fd-4d23-befc-484bae0a5481"
94
+ },
95
+ "source": [
96
+ "from transformers import RobertaTokenizer, RobertaForMaskedLM\n",
97
+ "tokenizer = RobertaTokenizer.from_pretrained('roberta-base')\n",
98
+ "model = RobertaForMaskedLM.from_pretrained('roberta-base')\n",
99
+ "\n",
100
+ "def set_seed(seed: int):\n",
101
+ " \"\"\"\n",
102
+ " Helper function for reproducible behavior to set the seed in ``random``, ``numpy``, ``torch`` and/or ``tf`` (if\n",
103
+ " installed).\n",
104
+ "\n",
105
+ " Args:\n",
106
+ " seed (:obj:`int`): The seed to set.\n",
107
+ " \"\"\"\n",
108
+ " #random.seed(seed)\n",
109
+ " #np.random.seed(seed)\n",
110
+ " #if is_torch_available():\n",
111
+ " torch.manual_seed(seed)\n",
112
+ " torch.cuda.manual_seed_all(seed)\n",
113
+ " # ^^ safe to call this function even if cuda is not available\n",
114
+ " #if is_tf_available():\n",
115
+ " #tf.random.set_seed(seed)\n",
116
+ " \n",
117
+ "def get_prediction(sent):\n",
118
+ " \n",
119
+ " token_ids = tokenizer.encode(sent, return_tensors='pt')\n",
120
+ " masked_position = (token_ids.squeeze() == tokenizer.mask_token_id).nonzero()\n",
121
+ " masked_pos = [mask.item() for mask in masked_position ]\n",
122
+ "\n",
123
+ " with torch.no_grad():\n",
124
+ " output = model(token_ids)\n",
125
+ "\n",
126
+ " last_hidden_state = output[0].squeeze()\n",
127
+ "\n",
128
+ " list_of_list =[]\n",
129
+ " for index,mask_index in enumerate(masked_pos):\n",
130
+ " mask_hidden_state = last_hidden_state[mask_index]\n",
131
+ " idx = torch.topk(mask_hidden_state, k=5, dim=0)[1]\n",
132
+ " words = [tokenizer.decode(i.item()).strip() for i in idx]\n",
133
+ " list_of_list.append(words)\n",
134
+ " print(\"Mask \",index+1,\"Guesses : \",words)\n",
135
+ " \n",
136
+ " best_guess = \"\"\n",
137
+ " for j in list_of_list:\n",
138
+ " best_guess = best_guess+\" \"+j[0]\n",
139
+ " \n",
140
+ " return best_guess\n",
141
+ "\n",
142
+ "sentence = \"Manchester United are ___ ___ ___ champions.\"\n",
143
+ "print (\"Original Sentence: \",sentence)\n",
144
+ "sentence = sentence.replace(\"___\",\"<mask>\")\n",
145
+ "print (\"Original Sentence replaced with mask: \",sentence)\n",
146
+ "print (\"\\n\")\n",
147
+ "\n",
148
+ "predicted_blanks = get_prediction(sentence)\n",
149
+ "print (\"\\nBest guess for fill in the blank :::\",predicted_blanks)"
150
+ ],
151
+ "execution_count": 49,
152
+ "outputs": [
153
+ {
154
+ "output_type": "stream",
155
+ "name": "stdout",
156
+ "text": [
157
+ "Original Sentence: Manchester United are ___ ___ ___ champions.\n",
158
+ "Original Sentence replaced with mask: Manchester United are <mask> <mask> <mask> champions.\n",
159
+ "\n",
160
+ "\n",
161
+ "Mask 1 Guesses : ['the', 'defending', 'currently', 'reigning', 'crowned']\n",
162
+ "Mask 2 Guesses : ['reigning', 'defending', 'the', 'crowned', 'Premier']\n",
163
+ "Mask 3 Guesses : ['League', 'league', 'defending', 'world', 'four']\n",
164
+ "\n",
165
+ "Best guess for fill in the blank ::: the reigning League\n"
166
+ ]
167
+ }
168
+ ]
169
+ },
170
+ {
171
+ "cell_type": "code",
172
+ "metadata": {
173
+ "colab": {
174
+ "base_uri": "https://localhost:8080/"
175
+ },
176
+ "id": "dDeTSyiisEJJ",
177
+ "outputId": "ce4cfa3c-875f-4225-ab2e-dd7b4e2a012a"
178
+ },
179
+ "source": [
180
+ "from transformers import pipeline\n",
181
+ "text_generation = pipeline(\"text-generation\")\n",
182
+ "\n",
183
+ "limericks = []\n",
184
+ "set_seed(31)\n",
185
+ "\n",
186
+ "starting_words = [[\"That\", \"Had\", \"Not\", \"But\", \"That\"], \n",
187
+ " [\"There\", \"Who\", \"She\", \"Tormenting\", \"Til\"],\n",
188
+ " [\"Relentless\", \"This\", \"First\", \"and\", \"then\"],\n",
189
+ " [\"There\", \"Who\", \"That\", \"To\", \"She\"],\n",
190
+ " [\"There\", \"Who\", \"Two\", \"Four\", \"Have\"]]\n",
191
+ "\n",
192
+ "rhyming_words = [[\"told\", \"bold\", \"woodchuck\", \"truck\", \"road\"], \n",
193
+ " [\"Nice\", \"grease\", \"house\", \"spouse\", \"peace\"],\n",
194
+ " [\"deadlines\", \"lines\", \"edits\", \"credits\", \"wine\"],\n",
195
+ " [\"Lynn\", \"thin\", \"essayed\", \"lemonade\", \"in\"],\n",
196
+ " [\"beard\", \"feared\", \"hen\", \"wren\", \"beard\"]]\n",
197
+ "\n",
198
+ "for i in range(len(starting_words)):\n",
199
+ " limerick = \"\"\n",
200
+ "\n",
201
+ " for j in range(5):\n",
202
+ " gpt2_sentence = text_generation(starting_words[i][j], max_length=3, do_sample=False)[0]\n",
203
+ " sentence = gpt2_sentence['generated_text'] + \" ___ ___ ___ \" + rhyming_words[i][j]\n",
204
+ " print(\"Original Sentence: \",sentence)\n",
205
+ " sentence = sentence.replace(\"___\",\"<mask>\")\n",
206
+ " print(\"Original Sentence replaced with mask: \",sentence)\n",
207
+ " print(\"\\n\")\n",
208
+ "\n",
209
+ " predicted_blanks = get_prediction(sentence)\n",
210
+ " print(\"\\nBest guess for fill in the blank: \", predicted_blanks)\n",
211
+ " limerick = limerick + gpt2_sentence['generated_text'] + predicted_blanks + \" \" + rhyming_words[i][j] + \"\\n\"\n",
212
+ "\n",
213
+ " limericks.append(limerick)\n",
214
+ "\n",
215
+ "print(\"\\n\")\n",
216
+ "print(f\"Generated {len(limericks)} limericks: \\n\")\n",
217
+ "for limerick in limericks:\n",
218
+ " print(limerick)"
219
+ ],
220
+ "execution_count": 50,
221
+ "outputs": [
222
+ {
223
+ "output_type": "stream",
224
+ "name": "stderr",
225
+ "text": [
226
+ "No model was supplied, defaulted to gpt2 (https://huggingface.co/gpt2)\n",
227
+ "Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n",
228
+ "Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n"
229
+ ]
230
+ },
231
+ {
232
+ "output_type": "stream",
233
+ "name": "stdout",
234
+ "text": [
235
+ "Original Sentence: That is why ___ ___ ___ told\n",
236
+ "Original Sentence replaced with mask: That is why <mask> <mask> <mask> told\n",
237
+ "\n",
238
+ "\n",
239
+ "Mask 1 Guesses : ['you', 'the', 'we', 'I', 'they']\n",
240
+ "Mask 2 Guesses : ['will', 'should', 'must', 'is', 'are']\n",
241
+ "Mask 3 Guesses : ['be', 'been', 'not', 'never', 'being']\n",
242
+ "\n",
243
+ "Best guess for fill in the blank: you will be\n"
244
+ ]
245
+ },
246
+ {
247
+ "output_type": "stream",
248
+ "name": "stderr",
249
+ "text": [
250
+ "Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n"
251
+ ]
252
+ },
253
+ {
254
+ "output_type": "stream",
255
+ "name": "stdout",
256
+ "text": [
257
+ "Original Sentence: Had the same ___ ___ ___ bold\n",
258
+ "Original Sentence replaced with mask: Had the same <mask> <mask> <mask> bold\n",
259
+ "\n",
260
+ "\n",
261
+ "Mask 1 Guesses : ['font', 'color', 'name', 'text', 'letters']\n",
262
+ "Mask 2 Guesses : [',', 'but', '.', 'color', 'and']\n",
263
+ "Mask 3 Guesses : ['in', ':', 'but', 'be', ',']\n",
264
+ "\n",
265
+ "Best guess for fill in the blank: font , in\n"
266
+ ]
267
+ },
268
+ {
269
+ "output_type": "stream",
270
+ "name": "stderr",
271
+ "text": [
272
+ "Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n"
273
+ ]
274
+ },
275
+ {
276
+ "output_type": "stream",
277
+ "name": "stdout",
278
+ "text": [
279
+ "Original Sentence: Not the only ___ ___ ___ woodchuck\n",
280
+ "Original Sentence replaced with mask: Not the only <mask> <mask> <mask> woodchuck\n",
281
+ "\n",
282
+ "\n",
283
+ "Mask 1 Guesses : ['one', 'non', 'species', 'kind', 'American']\n",
284
+ "Mask 2 Guesses : ['-', 'of', 'for', 'in', 'with']\n",
285
+ "Mask 3 Guesses : ['the', 'a', 'eating', 'y', 'American']\n",
286
+ "\n",
287
+ "Best guess for fill in the blank: one - the\n"
288
+ ]
289
+ },
290
+ {
291
+ "output_type": "stream",
292
+ "name": "stderr",
293
+ "text": [
294
+ "Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n"
295
+ ]
296
+ },
297
+ {
298
+ "output_type": "stream",
299
+ "name": "stdout",
300
+ "text": [
301
+ "Original Sentence: But the most ___ ___ ___ truck\n",
302
+ "Original Sentence replaced with mask: But the most <mask> <mask> <mask> truck\n",
303
+ "\n",
304
+ "\n",
305
+ "Mask 1 Guesses : ['important', 'dangerous', 'valuable', 'interesting', 'expensive']\n",
306
+ "Mask 2 Guesses : ['is', ':', '-', '?', 'kind']\n",
307
+ "Mask 3 Guesses : ['the', 'a', 'pickup', 'delivery', 'is']\n",
308
+ "\n",
309
+ "Best guess for fill in the blank: important is the\n"
310
+ ]
311
+ },
312
+ {
313
+ "output_type": "stream",
314
+ "name": "stderr",
315
+ "text": [
316
+ "Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n"
317
+ ]
318
+ },
319
+ {
320
+ "output_type": "stream",
321
+ "name": "stdout",
322
+ "text": [
323
+ "Original Sentence: That is why ___ ___ ___ road\n",
324
+ "Original Sentence replaced with mask: That is why <mask> <mask> <mask> road\n",
325
+ "\n",
326
+ "\n",
327
+ "Mask 1 Guesses : ['I', 'we', 'they', 'you', 'he']\n",
328
+ "Mask 2 Guesses : ['built', 'closed', 'build', 'cross', 'need']\n",
329
+ "Mask 3 Guesses : ['the', 'this', 'a', 'that', 'my']\n",
330
+ "\n",
331
+ "Best guess for fill in the blank: I built the\n"
332
+ ]
333
+ },
334
+ {
335
+ "output_type": "stream",
336
+ "name": "stderr",
337
+ "text": [
338
+ "Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n"
339
+ ]
340
+ },
341
+ {
342
+ "output_type": "stream",
343
+ "name": "stdout",
344
+ "text": [
345
+ "Original Sentence: There is no ___ ___ ___ Nice\n",
346
+ "Original Sentence replaced with mask: There is no <mask> <mask> <mask> Nice\n",
347
+ "\n",
348
+ "\n",
349
+ "Mask 1 Guesses : ['way', 'one', 'other', 'good', 'more']\n",
350
+ "Mask 2 Guesses : ['to', 'city', 'way', 'state', 'of']\n",
351
+ "Mask 3 Guesses : ['in', 'for', 'of', 'to', 'than']\n",
352
+ "\n",
353
+ "Best guess for fill in the blank: way to in\n"
354
+ ]
355
+ },
356
+ {
357
+ "output_type": "stream",
358
+ "name": "stderr",
359
+ "text": [
360
+ "Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n"
361
+ ]
362
+ },
363
+ {
364
+ "output_type": "stream",
365
+ "name": "stdout",
366
+ "text": [
367
+ "Original Sentence: Who is the ___ ___ ___ grease\n",
368
+ "Original Sentence replaced with mask: Who is the <mask> <mask> <mask> grease\n",
369
+ "\n",
370
+ "\n",
371
+ "Mask 1 Guesses : ['master', 'king', 'man', 'god', 'boss']\n",
372
+ "Mask 2 Guesses : ['of', 'in', '?', 'with', 'behind']\n",
373
+ "Mask 3 Guesses : ['elbow', 'the', 'of', 'bacon', 'in']\n",
374
+ "\n",
375
+ "Best guess for fill in the blank: master of elbow\n"
376
+ ]
377
+ },
378
+ {
379
+ "output_type": "stream",
380
+ "name": "stderr",
381
+ "text": [
382
+ "Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n",
383
+ "Input length of input_ids is 3, but ``max_length`` is set to 3.This can lead to unexpected behavior. You should consider increasing ``config.max_length`` or ``max_length``.\n"
384
+ ]
385
+ },
386
+ {
387
+ "output_type": "stream",
388
+ "name": "stdout",
389
+ "text": [
390
+ "Original Sentence: She, who ___ ___ ___ house\n",
391
+ "Original Sentence replaced with mask: She, who <mask> <mask> <mask> house\n",
392
+ "\n",
393
+ "\n",
394
+ "Mask 1 Guesses : ['lived', ',', 'lives', 'never', 'owns']\n",
395
+ "Mask 2 Guesses : ['in', ',', 'a', 'the', 'her']\n",
396
+ "Mask 3 Guesses : ['the', 'her', 'own', 'a', ',']\n",
397
+ "\n",
398
+ "Best guess for fill in the blank: lived in the\n"
399
+ ]
400
+ },
401
+ {
402
+ "output_type": "stream",
403
+ "name": "stderr",
404
+ "text": [
405
+ "Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n"
406
+ ]
407
+ },
408
+ {
409
+ "output_type": "stream",
410
+ "name": "stdout",
411
+ "text": [
412
+ "Original Sentence: Tormenting the ___ ___ ___ spouse\n",
413
+ "Original Sentence replaced with mask: Tormenting the <mask> <mask> <mask> spouse\n",
414
+ "\n",
415
+ "\n",
416
+ "Mask 1 Guesses : ['life', 'relationship', 'soul', 'marriage', 'love']\n",
417
+ "Mask 2 Guesses : ['of', 'with', 'and', 'in', 'for']\n",
418
+ "Mask 3 Guesses : ['the', 'your', 'a', 'its', 'their']\n",
419
+ "\n",
420
+ "Best guess for fill in the blank: life of the\n"
421
+ ]
422
+ },
423
+ {
424
+ "output_type": "stream",
425
+ "name": "stderr",
426
+ "text": [
427
+ "Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n",
428
+ "Input length of input_ids is 3, but ``max_length`` is set to 3.This can lead to unexpected behavior. You should consider increasing ``config.max_length`` or ``max_length``.\n"
429
+ ]
430
+ },
431
+ {
432
+ "output_type": "stream",
433
+ "name": "stdout",
434
+ "text": [
435
+ "Original Sentence: Tilted ___ ___ ___ peace\n",
436
+ "Original Sentence replaced with mask: Tilted <mask> <mask> <mask> peace\n",
437
+ "\n",
438
+ "\n",
439
+ "Mask 1 Guesses : ['by', ',', 'war', 'over', 'water']\n",
440
+ "Mask 2 Guesses : [',', 'of', ':', 'to', 'for']\n",
441
+ "Mask 3 Guesses : ['of', 'for', 'world', 'not', 'fragile']\n",
442
+ "\n",
443
+ "Best guess for fill in the blank: by , of\n"
444
+ ]
445
+ },
446
+ {
447
+ "output_type": "stream",
448
+ "name": "stderr",
449
+ "text": [
450
+ "Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n"
451
+ ]
452
+ },
453
+ {
454
+ "output_type": "stream",
455
+ "name": "stdout",
456
+ "text": [
457
+ "Original Sentence: Relentless, ___ ___ ___ deadlines\n",
458
+ "Original Sentence replaced with mask: Relentless, <mask> <mask> <mask> deadlines\n",
459
+ "\n",
460
+ "\n",
461
+ "Mask 1 Guesses : ['focused', 'un', 'self', 'never', 'determined']\n",
462
+ "Mask 2 Guesses : ['-', 'to', ',', 'and', 'of']\n",
463
+ "Mask 3 Guesses : ['meet', 'of', 'to', ',', 'and']\n",
464
+ "\n",
465
+ "Best guess for fill in the blank: focused - meet\n"
466
+ ]
467
+ },
468
+ {
469
+ "output_type": "stream",
470
+ "name": "stderr",
471
+ "text": [
472
+ "Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n"
473
+ ]
474
+ },
475
+ {
476
+ "output_type": "stream",
477
+ "name": "stdout",
478
+ "text": [
479
+ "Original Sentence: This is a ___ ___ ___ lines\n",
480
+ "Original Sentence replaced with mask: This is a <mask> <mask> <mask> lines\n",
481
+ "\n",
482
+ "\n",
483
+ "Mask 1 Guesses : ['list', 'sample', 'function', 'sequence', 'summary']\n",
484
+ "Mask 2 Guesses : ['of', 'with', 'for', 'between', 'from']\n",
485
+ "Mask 3 Guesses : ['the', 'these', 'two', 'some', 'those']\n",
486
+ "\n",
487
+ "Best guess for fill in the blank: list of the\n"
488
+ ]
489
+ },
490
+ {
491
+ "output_type": "stream",
492
+ "name": "stderr",
493
+ "text": [
494
+ "Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n"
495
+ ]
496
+ },
497
+ {
498
+ "output_type": "stream",
499
+ "name": "stdout",
500
+ "text": [
501
+ "Original Sentence: First, the ___ ___ ___ edits\n",
502
+ "Original Sentence replaced with mask: First, the <mask> <mask> <mask> edits\n",
503
+ "\n",
504
+ "\n",
505
+ "Mask 1 Guesses : ['most', 'first', 'big', 'more', 'final']\n",
506
+ "Mask 2 Guesses : ['-', 'and', 'of', ',', 'for']\n",
507
+ "Mask 3 Guesses : ['the', 'and', 'of', 'final', 'my']\n",
508
+ "\n",
509
+ "Best guess for fill in the blank: most - the\n"
510
+ ]
511
+ },
512
+ {
513
+ "output_type": "stream",
514
+ "name": "stderr",
515
+ "text": [
516
+ "Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n"
517
+ ]
518
+ },
519
+ {
520
+ "output_type": "stream",
521
+ "name": "stdout",
522
+ "text": [
523
+ "Original Sentence: and, in ___ ___ ___ credits\n",
524
+ "Original Sentence replaced with mask: and, in <mask> <mask> <mask> credits\n",
525
+ "\n",
526
+ "\n",
527
+ "Mask 1 Guesses : ['addition', 'the', 'particular', 'part', 'conjunction']\n",
528
+ "Mask 2 Guesses : [',', 'to', 'of', 'the', 'with']\n",
529
+ "Mask 3 Guesses : ['the', 'closing', 'film', ',', 'movie']\n",
530
+ "\n",
531
+ "Best guess for fill in the blank: addition , the\n"
532
+ ]
533
+ },
534
+ {
535
+ "output_type": "stream",
536
+ "name": "stderr",
537
+ "text": [
538
+ "Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n"
539
+ ]
540
+ },
541
+ {
542
+ "output_type": "stream",
543
+ "name": "stdout",
544
+ "text": [
545
+ "Original Sentence: then, the ___ ___ ___ wine\n",
546
+ "Original Sentence replaced with mask: then, the <mask> <mask> <mask> wine\n",
547
+ "\n",
548
+ "\n",
549
+ "Mask 1 Guesses : ['next', 'wine', 'whole', 'first', 'final']\n",
550
+ "Mask 2 Guesses : ['of', 'is', ':', ',', 'question']\n",
551
+ "Mask 3 Guesses : ['red', 'white', 'the', 'of', 'fine']\n",
552
+ "\n",
553
+ "Best guess for fill in the blank: next of red\n"
554
+ ]
555
+ },
556
+ {
557
+ "output_type": "stream",
558
+ "name": "stderr",
559
+ "text": [
560
+ "Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n"
561
+ ]
562
+ },
563
+ {
564
+ "output_type": "stream",
565
+ "name": "stdout",
566
+ "text": [
567
+ "Original Sentence: There is no ___ ___ ___ Lynn\n",
568
+ "Original Sentence replaced with mask: There is no <mask> <mask> <mask> Lynn\n",
569
+ "\n",
570
+ "\n",
571
+ "Mask 1 Guesses : ['cure', 'way', 'end', 'right', 'justice']\n",
572
+ "Mask 2 Guesses : ['for', 'to', 'of', 'in', 'like']\n",
573
+ "Mask 3 Guesses : ['Christopher', 'Lord', 'Jeremy', 'Chris', 'Richard']\n",
574
+ "\n",
575
+ "Best guess for fill in the blank: cure for Christopher\n"
576
+ ]
577
+ },
578
+ {
579
+ "output_type": "stream",
580
+ "name": "stderr",
581
+ "text": [
582
+ "Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n"
583
+ ]
584
+ },
585
+ {
586
+ "output_type": "stream",
587
+ "name": "stdout",
588
+ "text": [
589
+ "Original Sentence: Who is the ___ ___ ___ thin\n",
590
+ "Original Sentence replaced with mask: Who is the <mask> <mask> <mask> thin\n",
591
+ "\n",
592
+ "\n",
593
+ "Mask 1 Guesses : ['person', 'guy', 'one', 'woman', 'man']\n",
594
+ "Mask 2 Guesses : ['who', 'with', 'that', '?', 'not']\n",
595
+ "Mask 3 Guesses : ['is', 'looks', 'and', 'too', \"'s\"]\n",
596
+ "\n",
597
+ "Best guess for fill in the blank: person who is\n"
598
+ ]
599
+ },
600
+ {
601
+ "output_type": "stream",
602
+ "name": "stderr",
603
+ "text": [
604
+ "Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n"
605
+ ]
606
+ },
607
+ {
608
+ "output_type": "stream",
609
+ "name": "stdout",
610
+ "text": [
611
+ "Original Sentence: That is why ___ ___ ___ essayed\n",
612
+ "Original Sentence replaced with mask: That is why <mask> <mask> <mask> essayed\n",
613
+ "\n",
614
+ "\n",
615
+ "Mask 1 Guesses : ['I', 'the', 'he', 'this', 'you']\n",
616
+ "Mask 2 Guesses : ['was', 'is', 'have', 'has', 'will']\n",
617
+ "Mask 3 Guesses : ['was', 'is', 'be', 'being', 'been']\n",
618
+ "\n",
619
+ "Best guess for fill in the blank: I was was\n"
620
+ ]
621
+ },
622
+ {
623
+ "output_type": "stream",
624
+ "name": "stderr",
625
+ "text": [
626
+ "Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n"
627
+ ]
628
+ },
629
+ {
630
+ "output_type": "stream",
631
+ "name": "stdout",
632
+ "text": [
633
+ "Original Sentence: To the next ___ ___ ___ lemonade\n",
634
+ "Original Sentence replaced with mask: To the next <mask> <mask> <mask> lemonade\n",
635
+ "\n",
636
+ "\n",
637
+ "Mask 1 Guesses : ['level', 'step', 'day', 'generation', 'round']\n",
638
+ "Mask 2 Guesses : [',', ':', '</s>', 'of', '.']\n",
639
+ "Mask 3 Guesses : ['more', 'the', 'of', 'make', 'More']\n",
640
+ "\n",
641
+ "Best guess for fill in the blank: level , more\n"
642
+ ]
643
+ },
644
+ {
645
+ "output_type": "stream",
646
+ "name": "stderr",
647
+ "text": [
648
+ "Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n"
649
+ ]
650
+ },
651
+ {
652
+ "output_type": "stream",
653
+ "name": "stdout",
654
+ "text": [
655
+ "Original Sentence: She, who ___ ___ ___ in\n",
656
+ "Original Sentence replaced with mask: She, who <mask> <mask> <mask> in\n",
657
+ "\n",
658
+ "\n",
659
+ "Mask 1 Guesses : ['is', 'was', 'has', 'I', 'had']\n",
660
+ "Mask 2 Guesses : ['a', ',', 'an', 'also', 'been']\n",
661
+ "Mask 3 Guesses : ['interested', 'live', 'involved', 'born', 'lives']\n",
662
+ "\n",
663
+ "Best guess for fill in the blank: is a interested\n"
664
+ ]
665
+ },
666
+ {
667
+ "output_type": "stream",
668
+ "name": "stderr",
669
+ "text": [
670
+ "Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n"
671
+ ]
672
+ },
673
+ {
674
+ "output_type": "stream",
675
+ "name": "stdout",
676
+ "text": [
677
+ "Original Sentence: There is no ___ ___ ___ beard\n",
678
+ "Original Sentence replaced with mask: There is no <mask> <mask> <mask> beard\n",
679
+ "\n",
680
+ "\n",
681
+ "Mask 1 Guesses : ['reason', 'need', 'point', 'place', 'room']\n",
682
+ "Mask 2 Guesses : ['for', 'to', 'in', 'with', 'of']\n",
683
+ "Mask 3 Guesses : ['a', 'the', 'your', 'no', 'his']\n",
684
+ "\n",
685
+ "Best guess for fill in the blank: reason for a\n"
686
+ ]
687
+ },
688
+ {
689
+ "output_type": "stream",
690
+ "name": "stderr",
691
+ "text": [
692
+ "Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n"
693
+ ]
694
+ },
695
+ {
696
+ "output_type": "stream",
697
+ "name": "stdout",
698
+ "text": [
699
+ "Original Sentence: Who is the ___ ___ ___ feared\n",
700
+ "Original Sentence replaced with mask: Who is the <mask> <mask> <mask> feared\n",
701
+ "\n",
702
+ "\n",
703
+ "Mask 1 Guesses : ['person', 'man', 'terrorist', 'leader', 'one']\n",
704
+ "Mask 2 Guesses : ['who', 'leader', 'and', ',', 'that']\n",
705
+ "Mask 3 Guesses : ['is', 'most', 'and', 'be', 'are']\n",
706
+ "\n",
707
+ "Best guess for fill in the blank: person who is\n"
708
+ ]
709
+ },
710
+ {
711
+ "output_type": "stream",
712
+ "name": "stderr",
713
+ "text": [
714
+ "Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n"
715
+ ]
716
+ },
717
+ {
718
+ "output_type": "stream",
719
+ "name": "stdout",
720
+ "text": [
721
+ "Original Sentence: Two, the ___ ___ ___ hen\n",
722
+ "Original Sentence replaced with mask: Two, the <mask> <mask> <mask> hen\n",
723
+ "\n",
724
+ "\n",
725
+ "Mask 1 Guesses : ['mother', 'm', 'f', 'hen', 'n']\n",
726
+ "Mask 2 Guesses : ['of', '-', 'and', \"'s\", ',']\n",
727
+ "Mask 3 Guesses : ['the', 'a', 'mother', 'he', 'house']\n",
728
+ "\n",
729
+ "Best guess for fill in the blank: mother of the\n"
730
+ ]
731
+ },
732
+ {
733
+ "output_type": "stream",
734
+ "name": "stderr",
735
+ "text": [
736
+ "Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.\n"
737
+ ]
738
+ },
739
+ {
740
+ "output_type": "stream",
741
+ "name": "stdout",
742
+ "text": [
743
+ "Original Sentence: Four, the ___ ___ ___ wren\n",
744
+ "Original Sentence replaced with mask: Four, the <mask> <mask> <mask> wren\n",
745
+ "\n",
746
+ "\n",
747
+ "Mask 1 Guesses : ['m', 'great', 'k', 'last', 'n']\n",
748
+ "Mask 2 Guesses : ['of', '-', 'and', ',', 'th']\n",
749
+ "Mask 3 Guesses : ['the', 'of', 'a', 'ian', \"'s\"]\n",
750
+ "\n",
751
+ "Best guess for fill in the blank: m of the\n",
752
+ "Original Sentence: Have the same ___ ___ ___ beard\n",
753
+ "Original Sentence replaced with mask: Have the same <mask> <mask> <mask> beard\n",
754
+ "\n",
755
+ "\n",
756
+ "Mask 1 Guesses : ['dark', 'hair', 'full', 'long', 'gray']\n",
757
+ "Mask 2 Guesses : ['-', 'and', 'as', 'but', ',']\n",
758
+ "Mask 3 Guesses : ['your', 'and', 'a', 'no', 'white']\n",
759
+ "\n",
760
+ "Best guess for fill in the blank: dark - your\n",
761
+ "\n",
762
+ "\n",
763
+ "Generated 5 limericks: \n",
764
+ "\n",
765
+ "That is why you will be told\n",
766
+ "Had the same font , in bold\n",
767
+ "Not the only one - the woodchuck\n",
768
+ "But the most important is the truck\n",
769
+ "That is why I built the road\n",
770
+ "\n",
771
+ "There is no way to in Nice\n",
772
+ "Who is the master of elbow grease\n",
773
+ "She, who lived in the house\n",
774
+ "Tormenting the life of the spouse\n",
775
+ "Tilted by , of peace\n",
776
+ "\n",
777
+ "Relentless, focused - meet deadlines\n",
778
+ "This is a list of the lines\n",
779
+ "First, the most - the edits\n",
780
+ "and, in addition , the credits\n",
781
+ "then, the next of red wine\n",
782
+ "\n",
783
+ "There is no cure for Christopher Lynn\n",
784
+ "Who is the person who is thin\n",
785
+ "That is why I was was essayed\n",
786
+ "To the next level , more lemonade\n",
787
+ "She, who is a interested in\n",
788
+ "\n",
789
+ "There is no reason for a beard\n",
790
+ "Who is the person who is feared\n",
791
+ "Two, the mother of the hen\n",
792
+ "Four, the m of the wren\n",
793
+ "Have the same dark - your beard\n",
794
+ "\n"
795
+ ]
796
+ }
797
+ ]
798
+ }
799
+ ]
800
+ }