huseinzol05 commited on
Commit
6e72ee3
1 Parent(s): f5abacf
.gitignore ADDED
@@ -0,0 +1 @@
 
1
+ *.ipynb_checkpoints
README.md ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: ms
3
+ ---
4
+
5
+ # t5-small-bahasa-cased
6
+
7
+ Pretrained T5 small language model for Malay.
8
+
9
+ ## Pretraining Corpus
10
+
11
+ `t5-small-bahasa-cased` model was pretrained on multiple tasks. Below is list of tasks we trained on,
12
+
13
+ 1. Language masking task on bahasa news, bahasa Wikipedia, bahasa Academia.edu, bahasa parliament and translated The Pile.
14
+ 2. News title prediction on bahasa news.
15
+ 3. Next sentence prediction on bahasa news, bahasa Wikipedia, bahasa Academia.edu, bahasa parliament and translated The Pile.
16
+ 4. Translated QA Natural.
17
+ 5. Text Similarity task on translated SNLI and translated MNLI.
18
+ 6. EN-MS translation.
19
+ 7. MS-EN translation.
20
+ 8. Abstractive Summarization.
21
+ 9. Knowledge Graph triples generation.
22
+ 10. Paraphrase.
23
+
24
+ Preparing steps can reproduce at https://github.com/huseinzol05/malaya/tree/master/pretrained-model/t5/prepare
25
+
26
+ ## Pretraining details
27
+
28
+ - This model was trained using Google T5 repository https://github.com/google-research/text-to-text-transfer-transformer, on v3-8 TPU.
29
+ - All steps can reproduce from here, https://github.com/huseinzol05/Malaya/tree/master/pretrained-model/t5
30
+
31
+ ## Load Pretrained Model
32
+
33
+ You can use this model by installing `torch` or `tensorflow` and Huggingface library `transformers`. And you can use it directly by initializing it like this:
34
+
35
+ ```python
36
+ from transformers import T5Tokenizer, T5Model
37
+
38
+ model = T5Model.from_pretrained('malay-huggingface/t5-small-bahasa-cased')
39
+ tokenizer = T5Tokenizer.from_pretrained('malay-huggingface/t5-small-bahasa-cased')
40
+ ```
41
+
42
+ ## Example using T5ForConditionalGeneration
43
+
44
+ ```python
45
+ from transformers import T5Tokenizer, T5ForConditionalGeneration
46
+
47
+ tokenizer = T5Tokenizer.from_pretrained('malay-huggingface/t5-small-bahasa-cased')
48
+ model = T5ForConditionalGeneration.from_pretrained('malay-huggingface/t5-small-bahasa-cased')
49
+ input_ids = tokenizer.encode('soalan: siapakah perdana menteri malaysia?', return_tensors = 'pt')
50
+ outputs = model.generate(input_ids)
51
+ print(tokenizer.decode(outputs[0]))
52
+ ```
53
+
54
+ Output is,
55
+
56
+ ```
57
+ 'Mahathir Mohamad'
58
+ ```
59
+
60
+ ## Supported prefix
61
+
62
+ 1. `soalan: {string}`, trained using Natural QA.
63
+ 2. `ringkasan: {string}`, for abstractive summarization.
64
+ 3. `tajuk: {string}`, for abstractive title.
65
+ 4. `parafrasa: {string}`, for abstractive paraphrase.
66
+ 5. `terjemah Inggeris ke Melayu: {string}`, for EN-MS translation.
67
+ 6. `terjemah Melayu ke Inggeris: {string}`, for MS-EN translation.
68
+ 7. `grafik pengetahuan: {string}`, for MS text to EN Knowledge Graph triples format.
69
+ 8. `ayat1: {string1} ayat2: {string2}`, semantic similarity.
config.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "./pytorch_model.bin",
3
+ "architectures": [
4
+ "T5Model"
5
+ ],
6
+ "d_ff": 1344,
7
+ "d_kv": 64,
8
+ "d_model": 384,
9
+ "decoder_start_token_id": 0,
10
+ "dropout_rate": 0.1,
11
+ "eos_token_id": 1,
12
+ "feed_forward_proj": "relu",
13
+ "gradient_checkpointing": false,
14
+ "initializer_factor": 1.0,
15
+ "inputs_length": 512,
16
+ "is_encoder_decoder": true,
17
+ "layer_norm_epsilon": 1e-06,
18
+ "model_type": "t5",
19
+ "n_positions": 512,
20
+ "num_decoder_layers": 4,
21
+ "num_heads": 12,
22
+ "num_layers": 4,
23
+ "pad_token_id": 0,
24
+ "relative_attention_num_buckets": 32,
25
+ "torch_dtype": "float32",
26
+ "transformers_version": "4.10.0",
27
+ "use_cache": true,
28
+ "vocab_size": 32128
29
+ }
convert-from-malaya.ipynb ADDED
@@ -0,0 +1,629 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "code",
5
+ "execution_count": 1,
6
+ "metadata": {
7
+ "scrolled": true
8
+ },
9
+ "outputs": [
10
+ {
11
+ "data": {
12
+ "text/plain": [
13
+ "'4.10.0'"
14
+ ]
15
+ },
16
+ "execution_count": 1,
17
+ "metadata": {},
18
+ "output_type": "execute_result"
19
+ }
20
+ ],
21
+ "source": [
22
+ "import transformers\n",
23
+ "transformers.__version__"
24
+ ]
25
+ },
26
+ {
27
+ "cell_type": "code",
28
+ "execution_count": 2,
29
+ "metadata": {},
30
+ "outputs": [],
31
+ "source": [
32
+ "from transformers import T5Config, T5Model, load_tf_weights_in_t5"
33
+ ]
34
+ },
35
+ {
36
+ "cell_type": "code",
37
+ "execution_count": 4,
38
+ "metadata": {},
39
+ "outputs": [
40
+ {
41
+ "name": "stdout",
42
+ "output_type": "stream",
43
+ "text": [
44
+ "checkpoint model.ckpt-1000000.index\r\n",
45
+ "model.ckpt-1000000.data-00000-of-00002 model.ckpt-1000000.meta\r\n",
46
+ "model.ckpt-1000000.data-00001-of-00002 operative_config.gin\r\n"
47
+ ]
48
+ }
49
+ ],
50
+ "source": [
51
+ "# !wget https://f000.backblazeb2.com/file/malaya-model/pretrained/t5-tiny-2021-07-28.tar.gz\n",
52
+ "# !tar -zxf t5-tiny-2021-07-28.tar.gz\n",
53
+ "# !rm t5-tiny-2021-07-28.tar.gz\n",
54
+ "!ls t5-tiny-v2"
55
+ ]
56
+ },
57
+ {
58
+ "cell_type": "code",
59
+ "execution_count": 5,
60
+ "metadata": {},
61
+ "outputs": [
62
+ {
63
+ "name": "stdout",
64
+ "output_type": "stream",
65
+ "text": [
66
+ "T5Config {\n",
67
+ " \"d_ff\": 1344,\n",
68
+ " \"d_kv\": 64,\n",
69
+ " \"d_model\": 384,\n",
70
+ " \"decoder_start_token_id\": 0,\n",
71
+ " \"dropout_rate\": 0.1,\n",
72
+ " \"eos_token_id\": 1,\n",
73
+ " \"feed_forward_proj\": \"relu\",\n",
74
+ " \"gradient_checkpointing\": false,\n",
75
+ " \"initializer_factor\": 1.0,\n",
76
+ " \"inputs_length\": 512,\n",
77
+ " \"is_encoder_decoder\": true,\n",
78
+ " \"layer_norm_epsilon\": 1e-06,\n",
79
+ " \"model_type\": \"t5\",\n",
80
+ " \"n_positions\": 512,\n",
81
+ " \"num_decoder_layers\": 4,\n",
82
+ " \"num_heads\": 12,\n",
83
+ " \"num_layers\": 4,\n",
84
+ " \"pad_token_id\": 0,\n",
85
+ " \"relative_attention_num_buckets\": 32,\n",
86
+ " \"transformers_version\": \"4.10.0\",\n",
87
+ " \"use_cache\": true,\n",
88
+ " \"vocab_size\": 32128\n",
89
+ "}\n",
90
+ "\n"
91
+ ]
92
+ }
93
+ ],
94
+ "source": [
95
+ "config = T5Config(\n",
96
+ " vocab_size = 32128,\n",
97
+ " n_positions=512,\n",
98
+ " d_ff = 1344,\n",
99
+ " d_kv = 64,\n",
100
+ " d_model = 384,\n",
101
+ " dropout_rate = 0.1,\n",
102
+ " inputs_length = 512,\n",
103
+ " num_heads = 12,\n",
104
+ " num_layers = 4,\n",
105
+ " decoder_start_token_id = 0,\n",
106
+ " eos_token_id = 1,\n",
107
+ " pad_token_id = 0)\n",
108
+ "print(config)\n",
109
+ "config.save_pretrained('./')"
110
+ ]
111
+ },
112
+ {
113
+ "cell_type": "code",
114
+ "execution_count": 6,
115
+ "metadata": {},
116
+ "outputs": [
117
+ {
118
+ "data": {
119
+ "text/plain": [
120
+ "T5Model(\n",
121
+ " (shared): Embedding(32128, 384)\n",
122
+ " (encoder): T5Stack(\n",
123
+ " (embed_tokens): Embedding(32128, 384)\n",
124
+ " (block): ModuleList(\n",
125
+ " (0): T5Block(\n",
126
+ " (layer): ModuleList(\n",
127
+ " (0): T5LayerSelfAttention(\n",
128
+ " (SelfAttention): T5Attention(\n",
129
+ " (q): Linear(in_features=384, out_features=768, bias=False)\n",
130
+ " (k): Linear(in_features=384, out_features=768, bias=False)\n",
131
+ " (v): Linear(in_features=384, out_features=768, bias=False)\n",
132
+ " (o): Linear(in_features=768, out_features=384, bias=False)\n",
133
+ " (relative_attention_bias): Embedding(32, 12)\n",
134
+ " )\n",
135
+ " (layer_norm): T5LayerNorm()\n",
136
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
137
+ " )\n",
138
+ " (1): T5LayerFF(\n",
139
+ " (DenseReluDense): T5DenseReluDense(\n",
140
+ " (wi): Linear(in_features=384, out_features=1344, bias=False)\n",
141
+ " (wo): Linear(in_features=1344, out_features=384, bias=False)\n",
142
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
143
+ " )\n",
144
+ " (layer_norm): T5LayerNorm()\n",
145
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
146
+ " )\n",
147
+ " )\n",
148
+ " )\n",
149
+ " (1): T5Block(\n",
150
+ " (layer): ModuleList(\n",
151
+ " (0): T5LayerSelfAttention(\n",
152
+ " (SelfAttention): T5Attention(\n",
153
+ " (q): Linear(in_features=384, out_features=768, bias=False)\n",
154
+ " (k): Linear(in_features=384, out_features=768, bias=False)\n",
155
+ " (v): Linear(in_features=384, out_features=768, bias=False)\n",
156
+ " (o): Linear(in_features=768, out_features=384, bias=False)\n",
157
+ " )\n",
158
+ " (layer_norm): T5LayerNorm()\n",
159
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
160
+ " )\n",
161
+ " (1): T5LayerFF(\n",
162
+ " (DenseReluDense): T5DenseReluDense(\n",
163
+ " (wi): Linear(in_features=384, out_features=1344, bias=False)\n",
164
+ " (wo): Linear(in_features=1344, out_features=384, bias=False)\n",
165
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
166
+ " )\n",
167
+ " (layer_norm): T5LayerNorm()\n",
168
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
169
+ " )\n",
170
+ " )\n",
171
+ " )\n",
172
+ " (2): T5Block(\n",
173
+ " (layer): ModuleList(\n",
174
+ " (0): T5LayerSelfAttention(\n",
175
+ " (SelfAttention): T5Attention(\n",
176
+ " (q): Linear(in_features=384, out_features=768, bias=False)\n",
177
+ " (k): Linear(in_features=384, out_features=768, bias=False)\n",
178
+ " (v): Linear(in_features=384, out_features=768, bias=False)\n",
179
+ " (o): Linear(in_features=768, out_features=384, bias=False)\n",
180
+ " )\n",
181
+ " (layer_norm): T5LayerNorm()\n",
182
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
183
+ " )\n",
184
+ " (1): T5LayerFF(\n",
185
+ " (DenseReluDense): T5DenseReluDense(\n",
186
+ " (wi): Linear(in_features=384, out_features=1344, bias=False)\n",
187
+ " (wo): Linear(in_features=1344, out_features=384, bias=False)\n",
188
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
189
+ " )\n",
190
+ " (layer_norm): T5LayerNorm()\n",
191
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
192
+ " )\n",
193
+ " )\n",
194
+ " )\n",
195
+ " (3): T5Block(\n",
196
+ " (layer): ModuleList(\n",
197
+ " (0): T5LayerSelfAttention(\n",
198
+ " (SelfAttention): T5Attention(\n",
199
+ " (q): Linear(in_features=384, out_features=768, bias=False)\n",
200
+ " (k): Linear(in_features=384, out_features=768, bias=False)\n",
201
+ " (v): Linear(in_features=384, out_features=768, bias=False)\n",
202
+ " (o): Linear(in_features=768, out_features=384, bias=False)\n",
203
+ " )\n",
204
+ " (layer_norm): T5LayerNorm()\n",
205
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
206
+ " )\n",
207
+ " (1): T5LayerFF(\n",
208
+ " (DenseReluDense): T5DenseReluDense(\n",
209
+ " (wi): Linear(in_features=384, out_features=1344, bias=False)\n",
210
+ " (wo): Linear(in_features=1344, out_features=384, bias=False)\n",
211
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
212
+ " )\n",
213
+ " (layer_norm): T5LayerNorm()\n",
214
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
215
+ " )\n",
216
+ " )\n",
217
+ " )\n",
218
+ " )\n",
219
+ " (final_layer_norm): T5LayerNorm()\n",
220
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
221
+ " )\n",
222
+ " (decoder): T5Stack(\n",
223
+ " (embed_tokens): Embedding(32128, 384)\n",
224
+ " (block): ModuleList(\n",
225
+ " (0): T5Block(\n",
226
+ " (layer): ModuleList(\n",
227
+ " (0): T5LayerSelfAttention(\n",
228
+ " (SelfAttention): T5Attention(\n",
229
+ " (q): Linear(in_features=384, out_features=768, bias=False)\n",
230
+ " (k): Linear(in_features=384, out_features=768, bias=False)\n",
231
+ " (v): Linear(in_features=384, out_features=768, bias=False)\n",
232
+ " (o): Linear(in_features=768, out_features=384, bias=False)\n",
233
+ " (relative_attention_bias): Embedding(32, 12)\n",
234
+ " )\n",
235
+ " (layer_norm): T5LayerNorm()\n",
236
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
237
+ " )\n",
238
+ " (1): T5LayerCrossAttention(\n",
239
+ " (EncDecAttention): T5Attention(\n",
240
+ " (q): Linear(in_features=384, out_features=768, bias=False)\n",
241
+ " (k): Linear(in_features=384, out_features=768, bias=False)\n",
242
+ " (v): Linear(in_features=384, out_features=768, bias=False)\n",
243
+ " (o): Linear(in_features=768, out_features=384, bias=False)\n",
244
+ " )\n",
245
+ " (layer_norm): T5LayerNorm()\n",
246
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
247
+ " )\n",
248
+ " (2): T5LayerFF(\n",
249
+ " (DenseReluDense): T5DenseReluDense(\n",
250
+ " (wi): Linear(in_features=384, out_features=1344, bias=False)\n",
251
+ " (wo): Linear(in_features=1344, out_features=384, bias=False)\n",
252
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
253
+ " )\n",
254
+ " (layer_norm): T5LayerNorm()\n",
255
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
256
+ " )\n",
257
+ " )\n",
258
+ " )\n",
259
+ " (1): T5Block(\n",
260
+ " (layer): ModuleList(\n",
261
+ " (0): T5LayerSelfAttention(\n",
262
+ " (SelfAttention): T5Attention(\n",
263
+ " (q): Linear(in_features=384, out_features=768, bias=False)\n",
264
+ " (k): Linear(in_features=384, out_features=768, bias=False)\n",
265
+ " (v): Linear(in_features=384, out_features=768, bias=False)\n",
266
+ " (o): Linear(in_features=768, out_features=384, bias=False)\n",
267
+ " )\n",
268
+ " (layer_norm): T5LayerNorm()\n",
269
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
270
+ " )\n",
271
+ " (1): T5LayerCrossAttention(\n",
272
+ " (EncDecAttention): T5Attention(\n",
273
+ " (q): Linear(in_features=384, out_features=768, bias=False)\n",
274
+ " (k): Linear(in_features=384, out_features=768, bias=False)\n",
275
+ " (v): Linear(in_features=384, out_features=768, bias=False)\n",
276
+ " (o): Linear(in_features=768, out_features=384, bias=False)\n",
277
+ " )\n",
278
+ " (layer_norm): T5LayerNorm()\n",
279
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
280
+ " )\n",
281
+ " (2): T5LayerFF(\n",
282
+ " (DenseReluDense): T5DenseReluDense(\n",
283
+ " (wi): Linear(in_features=384, out_features=1344, bias=False)\n",
284
+ " (wo): Linear(in_features=1344, out_features=384, bias=False)\n",
285
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
286
+ " )\n",
287
+ " (layer_norm): T5LayerNorm()\n",
288
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
289
+ " )\n",
290
+ " )\n",
291
+ " )\n",
292
+ " (2): T5Block(\n",
293
+ " (layer): ModuleList(\n",
294
+ " (0): T5LayerSelfAttention(\n",
295
+ " (SelfAttention): T5Attention(\n",
296
+ " (q): Linear(in_features=384, out_features=768, bias=False)\n",
297
+ " (k): Linear(in_features=384, out_features=768, bias=False)\n",
298
+ " (v): Linear(in_features=384, out_features=768, bias=False)\n",
299
+ " (o): Linear(in_features=768, out_features=384, bias=False)\n",
300
+ " )\n",
301
+ " (layer_norm): T5LayerNorm()\n",
302
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
303
+ " )\n",
304
+ " (1): T5LayerCrossAttention(\n",
305
+ " (EncDecAttention): T5Attention(\n",
306
+ " (q): Linear(in_features=384, out_features=768, bias=False)\n",
307
+ " (k): Linear(in_features=384, out_features=768, bias=False)\n",
308
+ " (v): Linear(in_features=384, out_features=768, bias=False)\n",
309
+ " (o): Linear(in_features=768, out_features=384, bias=False)\n",
310
+ " )\n",
311
+ " (layer_norm): T5LayerNorm()\n",
312
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
313
+ " )\n",
314
+ " (2): T5LayerFF(\n",
315
+ " (DenseReluDense): T5DenseReluDense(\n",
316
+ " (wi): Linear(in_features=384, out_features=1344, bias=False)\n",
317
+ " (wo): Linear(in_features=1344, out_features=384, bias=False)\n",
318
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
319
+ " )\n",
320
+ " (layer_norm): T5LayerNorm()\n",
321
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
322
+ " )\n",
323
+ " )\n",
324
+ " )\n",
325
+ " (3): T5Block(\n",
326
+ " (layer): ModuleList(\n",
327
+ " (0): T5LayerSelfAttention(\n",
328
+ " (SelfAttention): T5Attention(\n",
329
+ " (q): Linear(in_features=384, out_features=768, bias=False)\n",
330
+ " (k): Linear(in_features=384, out_features=768, bias=False)\n",
331
+ " (v): Linear(in_features=384, out_features=768, bias=False)\n",
332
+ " (o): Linear(in_features=768, out_features=384, bias=False)\n",
333
+ " )\n",
334
+ " (layer_norm): T5LayerNorm()\n",
335
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
336
+ " )\n",
337
+ " (1): T5LayerCrossAttention(\n",
338
+ " (EncDecAttention): T5Attention(\n",
339
+ " (q): Linear(in_features=384, out_features=768, bias=False)\n",
340
+ " (k): Linear(in_features=384, out_features=768, bias=False)\n",
341
+ " (v): Linear(in_features=384, out_features=768, bias=False)\n",
342
+ " (o): Linear(in_features=768, out_features=384, bias=False)\n",
343
+ " )\n",
344
+ " (layer_norm): T5LayerNorm()\n",
345
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
346
+ " )\n",
347
+ " (2): T5LayerFF(\n",
348
+ " (DenseReluDense): T5DenseReluDense(\n",
349
+ " (wi): Linear(in_features=384, out_features=1344, bias=False)\n",
350
+ " (wo): Linear(in_features=1344, out_features=384, bias=False)\n",
351
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
352
+ " )\n",
353
+ " (layer_norm): T5LayerNorm()\n",
354
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
355
+ " )\n",
356
+ " )\n",
357
+ " )\n",
358
+ " )\n",
359
+ " (final_layer_norm): T5LayerNorm()\n",
360
+ " (dropout): Dropout(p=0.1, inplace=False)\n",
361
+ " )\n",
362
+ ")"
363
+ ]
364
+ },
365
+ "execution_count": 6,
366
+ "metadata": {},
367
+ "output_type": "execute_result"
368
+ }
369
+ ],
370
+ "source": [
371
+ "model = T5Model(config)\n",
372
+ "load_tf_weights_in_t5(model, config, 't5-tiny-v2/model.ckpt-1000000')"
373
+ ]
374
+ },
375
+ {
376
+ "cell_type": "code",
377
+ "execution_count": 7,
378
+ "metadata": {},
379
+ "outputs": [
380
+ {
381
+ "data": {
382
+ "text/plain": [
383
+ "('config.json', 'pytorch_model.bin')"
384
+ ]
385
+ },
386
+ "execution_count": 7,
387
+ "metadata": {},
388
+ "output_type": "execute_result"
389
+ }
390
+ ],
391
+ "source": [
392
+ "from transformers import CONFIG_NAME, WEIGHTS_NAME\n",
393
+ "CONFIG_NAME, WEIGHTS_NAME"
394
+ ]
395
+ },
396
+ {
397
+ "cell_type": "code",
398
+ "execution_count": 8,
399
+ "metadata": {},
400
+ "outputs": [],
401
+ "source": [
402
+ "import torch\n",
403
+ "\n",
404
+ "torch.save(model.state_dict(), './' + WEIGHTS_NAME)"
405
+ ]
406
+ },
407
+ {
408
+ "cell_type": "code",
409
+ "execution_count": 9,
410
+ "metadata": {},
411
+ "outputs": [],
412
+ "source": [
413
+ "from transformers import T5Config, T5Model, T5Tokenizer"
414
+ ]
415
+ },
416
+ {
417
+ "cell_type": "code",
418
+ "execution_count": 10,
419
+ "metadata": {},
420
+ "outputs": [],
421
+ "source": [
422
+ "# !wget https://f000.backblazeb2.com/file/malaya-model/bpe/sp10m.cased.ms-en.model"
423
+ ]
424
+ },
425
+ {
426
+ "cell_type": "code",
427
+ "execution_count": 11,
428
+ "metadata": {},
429
+ "outputs": [
430
+ {
431
+ "data": {
432
+ "text/plain": [
433
+ "('./tokenizer_config.json',\n",
434
+ " './special_tokens_map.json',\n",
435
+ " './spiece.model',\n",
436
+ " './added_tokens.json')"
437
+ ]
438
+ },
439
+ "execution_count": 11,
440
+ "metadata": {},
441
+ "output_type": "execute_result"
442
+ }
443
+ ],
444
+ "source": [
445
+ "tokenizer = T5Tokenizer('sp10m.cased.ms-en.model')\n",
446
+ "tokenizer.save_pretrained('./')"
447
+ ]
448
+ },
449
+ {
450
+ "cell_type": "code",
451
+ "execution_count": 12,
452
+ "metadata": {},
453
+ "outputs": [],
454
+ "source": [
455
+ "tokenizer = T5Tokenizer.from_pretrained('./', lower = False)"
456
+ ]
457
+ },
458
+ {
459
+ "cell_type": "code",
460
+ "execution_count": 13,
461
+ "metadata": {},
462
+ "outputs": [],
463
+ "source": [
464
+ "config = T5Config.from_pretrained('./')"
465
+ ]
466
+ },
467
+ {
468
+ "cell_type": "code",
469
+ "execution_count": 14,
470
+ "metadata": {},
471
+ "outputs": [],
472
+ "source": [
473
+ "model = T5Model.from_pretrained('./pytorch_model.bin', config = config)"
474
+ ]
475
+ },
476
+ {
477
+ "cell_type": "code",
478
+ "execution_count": 15,
479
+ "metadata": {},
480
+ "outputs": [],
481
+ "source": [
482
+ "model.save_pretrained('./')"
483
+ ]
484
+ },
485
+ {
486
+ "cell_type": "code",
487
+ "execution_count": 16,
488
+ "metadata": {},
489
+ "outputs": [],
490
+ "source": [
491
+ "from transformers import T5Tokenizer, T5ForConditionalGeneration"
492
+ ]
493
+ },
494
+ {
495
+ "cell_type": "code",
496
+ "execution_count": 17,
497
+ "metadata": {},
498
+ "outputs": [],
499
+ "source": [
500
+ "model = T5ForConditionalGeneration.from_pretrained('./')"
501
+ ]
502
+ },
503
+ {
504
+ "cell_type": "code",
505
+ "execution_count": 18,
506
+ "metadata": {},
507
+ "outputs": [
508
+ {
509
+ "data": {
510
+ "text/plain": [
511
+ "'<pad> Mahathir Mohamad</s>'"
512
+ ]
513
+ },
514
+ "execution_count": 18,
515
+ "metadata": {},
516
+ "output_type": "execute_result"
517
+ }
518
+ ],
519
+ "source": [
520
+ "input_ids = tokenizer.encode('soalan: siapakah perdana menteri malaysia?', return_tensors = 'pt')\n",
521
+ "outputs = model.generate(input_ids)\n",
522
+ "tokenizer.decode(outputs[0])"
523
+ ]
524
+ },
525
+ {
526
+ "cell_type": "code",
527
+ "execution_count": 19,
528
+ "metadata": {},
529
+ "outputs": [
530
+ {
531
+ "data": {
532
+ "text/plain": [
533
+ "'<pad> PETALING JAYA: Bekas perdana menteri Najib Razak sudah mempersoalkan sama ada kerajaan tahu bagaimana menguruskan wabak wabak'"
534
+ ]
535
+ },
536
+ "execution_count": 19,
537
+ "metadata": {},
538
+ "output_type": "execute_result"
539
+ }
540
+ ],
541
+ "source": [
542
+ "input_ids = tokenizer.encode('terjemah Inggeris ke Melayu: PETALING JAYA: Former prime minister Najib Razak has questioned whether the government knows how to manage the Covid-19 pandemic, outlining several seemingly contradictory announcements it has made.', return_tensors = 'pt')\n",
543
+ "outputs = model.generate(input_ids)\n",
544
+ "tokenizer.decode(outputs[0])"
545
+ ]
546
+ },
547
+ {
548
+ "cell_type": "code",
549
+ "execution_count": 20,
550
+ "metadata": {},
551
+ "outputs": [
552
+ {
553
+ "data": {
554
+ "text/plain": [
555
+ "'<pad> PETALING JAYA: Former Prime Minister Datuk Seri Najib Tun Razak and Deputy Prime Minister Datuk Seri Ismail'"
556
+ ]
557
+ },
558
+ "execution_count": 20,
559
+ "metadata": {},
560
+ "output_type": "execute_result"
561
+ }
562
+ ],
563
+ "source": [
564
+ "input_ids = tokenizer.encode('terjemah Melayu ke Inggeris: PETALING JAYA: Pertemuan bekas Perdana Menteri, Datuk Seri Najib Tun Razak dan Timbalan Perdana Menteri, Datuk Seri Ismail Sabri Yaakob hari ini adalah bagi membincangkan isu berkaitan hala tuju dan dasar negara.', return_tensors = 'pt')\n",
565
+ "outputs = model.generate(input_ids)\n",
566
+ "tokenizer.decode(outputs[0])"
567
+ ]
568
+ },
569
+ {
570
+ "cell_type": "code",
571
+ "execution_count": 21,
572
+ "metadata": {},
573
+ "outputs": [
574
+ {
575
+ "data": {
576
+ "text/plain": [
577
+ "'<pad> Roman Catholic Archdiocese of Maracaibo shares border with Roman Catholic Diocese'"
578
+ ]
579
+ },
580
+ "execution_count": 21,
581
+ "metadata": {},
582
+ "output_type": "execute_result"
583
+ }
584
+ ],
585
+ "source": [
586
+ "input_ids = tokenizer.encode('grafik pengetahuan: Keuskupan Agung Katolik Rom Maracaibo terletak di barat daya Keuskupan Katolik Rom Machiques.', return_tensors = 'pt')\n",
587
+ "outputs = model.generate(input_ids)\n",
588
+ "tokenizer.decode(outputs[0])"
589
+ ]
590
+ },
591
+ {
592
+ "cell_type": "code",
593
+ "execution_count": 22,
594
+ "metadata": {},
595
+ "outputs": [],
596
+ "source": [
597
+ "!rm -rf t5-tiny-v2"
598
+ ]
599
+ },
600
+ {
601
+ "cell_type": "code",
602
+ "execution_count": null,
603
+ "metadata": {},
604
+ "outputs": [],
605
+ "source": []
606
+ }
607
+ ],
608
+ "metadata": {
609
+ "kernelspec": {
610
+ "display_name": "Python 3",
611
+ "language": "python",
612
+ "name": "python3"
613
+ },
614
+ "language_info": {
615
+ "codemirror_mode": {
616
+ "name": "ipython",
617
+ "version": 3
618
+ },
619
+ "file_extension": ".py",
620
+ "mimetype": "text/x-python",
621
+ "name": "python",
622
+ "nbconvert_exporter": "python",
623
+ "pygments_lexer": "ipython3",
624
+ "version": "3.7.7"
625
+ }
626
+ },
627
+ "nbformat": 4,
628
+ "nbformat_minor": 4
629
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:096a412a0a55079b223e1d60244d648fa4a04158a331672bb3461dcdd094dca2
3
+ size 139080297
sp10m.cased.ms-en.model ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:26de51154cccc9db6e65e5d466bdb0b1fff9fab1d80f4689711de943448addd6
3
+ size 803030
special_tokens_map.json ADDED
@@ -0,0 +1 @@
 
1
+ {"eos_token": "</s>", "unk_token": "<unk>", "pad_token": "<pad>", "additional_special_tokens": ["<extra_id_0>", "<extra_id_1>", "<extra_id_2>", "<extra_id_3>", "<extra_id_4>", "<extra_id_5>", "<extra_id_6>", "<extra_id_7>", "<extra_id_8>", "<extra_id_9>", "<extra_id_10>", "<extra_id_11>", "<extra_id_12>", "<extra_id_13>", "<extra_id_14>", "<extra_id_15>", "<extra_id_16>", "<extra_id_17>", "<extra_id_18>", "<extra_id_19>", "<extra_id_20>", "<extra_id_21>", "<extra_id_22>", "<extra_id_23>", "<extra_id_24>", "<extra_id_25>", "<extra_id_26>", "<extra_id_27>", "<extra_id_28>", "<extra_id_29>", "<extra_id_30>", "<extra_id_31>", "<extra_id_32>", "<extra_id_33>", "<extra_id_34>", "<extra_id_35>", "<extra_id_36>", "<extra_id_37>", "<extra_id_38>", "<extra_id_39>", "<extra_id_40>", "<extra_id_41>", "<extra_id_42>", "<extra_id_43>", "<extra_id_44>", "<extra_id_45>", "<extra_id_46>", "<extra_id_47>", "<extra_id_48>", "<extra_id_49>", "<extra_id_50>", "<extra_id_51>", "<extra_id_52>", "<extra_id_53>", "<extra_id_54>", "<extra_id_55>", "<extra_id_56>", "<extra_id_57>", "<extra_id_58>", "<extra_id_59>", "<extra_id_60>", "<extra_id_61>", "<extra_id_62>", "<extra_id_63>", "<extra_id_64>", "<extra_id_65>", "<extra_id_66>", "<extra_id_67>", "<extra_id_68>", "<extra_id_69>", "<extra_id_70>", "<extra_id_71>", "<extra_id_72>", "<extra_id_73>", "<extra_id_74>", "<extra_id_75>", "<extra_id_76>", "<extra_id_77>", "<extra_id_78>", "<extra_id_79>", "<extra_id_80>", "<extra_id_81>", "<extra_id_82>", "<extra_id_83>", "<extra_id_84>", "<extra_id_85>", "<extra_id_86>", "<extra_id_87>", "<extra_id_88>", "<extra_id_89>", "<extra_id_90>", "<extra_id_91>", "<extra_id_92>", "<extra_id_93>", "<extra_id_94>", "<extra_id_95>", "<extra_id_96>", "<extra_id_97>", "<extra_id_98>", "<extra_id_99>"]}
spiece.model ADDED
@@ -0,0 +1,3 @@
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:26de51154cccc9db6e65e5d466bdb0b1fff9fab1d80f4689711de943448addd6
3
+ size 803030
tokenizer_config.json ADDED
@@ -0,0 +1 @@
 
1
+ {"eos_token": "</s>", "unk_token": "<unk>", "pad_token": "<pad>", "extra_ids": 100, "additional_special_tokens": ["<extra_id_0>", "<extra_id_1>", "<extra_id_2>", "<extra_id_3>", "<extra_id_4>", "<extra_id_5>", "<extra_id_6>", "<extra_id_7>", "<extra_id_8>", "<extra_id_9>", "<extra_id_10>", "<extra_id_11>", "<extra_id_12>", "<extra_id_13>", "<extra_id_14>", "<extra_id_15>", "<extra_id_16>", "<extra_id_17>", "<extra_id_18>", "<extra_id_19>", "<extra_id_20>", "<extra_id_21>", "<extra_id_22>", "<extra_id_23>", "<extra_id_24>", "<extra_id_25>", "<extra_id_26>", "<extra_id_27>", "<extra_id_28>", "<extra_id_29>", "<extra_id_30>", "<extra_id_31>", "<extra_id_32>", "<extra_id_33>", "<extra_id_34>", "<extra_id_35>", "<extra_id_36>", "<extra_id_37>", "<extra_id_38>", "<extra_id_39>", "<extra_id_40>", "<extra_id_41>", "<extra_id_42>", "<extra_id_43>", "<extra_id_44>", "<extra_id_45>", "<extra_id_46>", "<extra_id_47>", "<extra_id_48>", "<extra_id_49>", "<extra_id_50>", "<extra_id_51>", "<extra_id_52>", "<extra_id_53>", "<extra_id_54>", "<extra_id_55>", "<extra_id_56>", "<extra_id_57>", "<extra_id_58>", "<extra_id_59>", "<extra_id_60>", "<extra_id_61>", "<extra_id_62>", "<extra_id_63>", "<extra_id_64>", "<extra_id_65>", "<extra_id_66>", "<extra_id_67>", "<extra_id_68>", "<extra_id_69>", "<extra_id_70>", "<extra_id_71>", "<extra_id_72>", "<extra_id_73>", "<extra_id_74>", "<extra_id_75>", "<extra_id_76>", "<extra_id_77>", "<extra_id_78>", "<extra_id_79>", "<extra_id_80>", "<extra_id_81>", "<extra_id_82>", "<extra_id_83>", "<extra_id_84>", "<extra_id_85>", "<extra_id_86>", "<extra_id_87>", "<extra_id_88>", "<extra_id_89>", "<extra_id_90>", "<extra_id_91>", "<extra_id_92>", "<extra_id_93>", "<extra_id_94>", "<extra_id_95>", "<extra_id_96>", "<extra_id_97>", "<extra_id_98>", "<extra_id_99>"], "sp_model_kwargs": {}, "tokenizer_class": "T5Tokenizer"}