Rachel1809 commited on
Commit
fc6b132
1 Parent(s): 017304c

Upload Toxic_comment_classification.ipynb

Browse files
Files changed (1) hide show
  1. Toxic_comment_classification.ipynb +1810 -0
Toxic_comment_classification.ipynb ADDED
@@ -0,0 +1,1810 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "code",
5
+ "execution_count": null,
6
+ "metadata": {
7
+ "colab": {
8
+ "base_uri": "https://localhost:8080/"
9
+ },
10
+ "id": "8DfEKlbt_TMI",
11
+ "outputId": "79666846-0691-490a-88b0-5f56f4769772"
12
+ },
13
+ "outputs": [
14
+ {
15
+ "output_type": "stream",
16
+ "name": "stdout",
17
+ "text": [
18
+ "Mounted at /content/drive/\n"
19
+ ]
20
+ }
21
+ ],
22
+ "source": [
23
+ "from google.colab import drive\n",
24
+ "drive.mount('/content/drive/')"
25
+ ],
26
+ "id": "8DfEKlbt_TMI"
27
+ },
28
+ {
29
+ "cell_type": "markdown",
30
+ "metadata": {
31
+ "id": "8c25705b"
32
+ },
33
+ "source": [
34
+ "# 1. Import libraries and load data"
35
+ ],
36
+ "id": "8c25705b"
37
+ },
38
+ {
39
+ "cell_type": "code",
40
+ "execution_count": null,
41
+ "metadata": {
42
+ "id": "5b07ecd3"
43
+ },
44
+ "outputs": [],
45
+ "source": [
46
+ "import os\n",
47
+ "import pandas as pd\n",
48
+ "import tensorflow as tf\n",
49
+ "import numpy as np"
50
+ ],
51
+ "id": "5b07ecd3"
52
+ },
53
+ {
54
+ "cell_type": "code",
55
+ "execution_count": null,
56
+ "metadata": {
57
+ "id": "91d7e1f0"
58
+ },
59
+ "outputs": [],
60
+ "source": [
61
+ "df = pd.read_csv(os.path.join(\"/content/drive/MyDrive/ColabNotebooks/data\", \"train.csv\"))"
62
+ ],
63
+ "id": "91d7e1f0"
64
+ },
65
+ {
66
+ "cell_type": "code",
67
+ "execution_count": null,
68
+ "metadata": {
69
+ "colab": {
70
+ "base_uri": "https://localhost:8080/",
71
+ "height": 815
72
+ },
73
+ "id": "1be479a4",
74
+ "outputId": "88d487c7-8f13-43fe-e866-3c472a6f03d9"
75
+ },
76
+ "outputs": [
77
+ {
78
+ "output_type": "execute_result",
79
+ "data": {
80
+ "text/plain": [
81
+ " id comment_text \\\n",
82
+ "0 0000997932d777bf Explanation\\nWhy the edits made under my usern... \n",
83
+ "1 000103f0d9cfb60f D'aww! He matches this background colour I'm s... \n",
84
+ "2 000113f07ec002fd Hey man, I'm really not trying to edit war. It... \n",
85
+ "3 0001b41b1c6bb37e \"\\nMore\\nI can't make any real suggestions on ... \n",
86
+ "4 0001d958c54c6e35 You, sir, are my hero. Any chance you remember... \n",
87
+ "... ... ... \n",
88
+ "159566 ffe987279560d7ff \":::::And for the second time of asking, when ... \n",
89
+ "159567 ffea4adeee384e90 You should be ashamed of yourself \\n\\nThat is ... \n",
90
+ "159568 ffee36eab5c267c9 Spitzer \\n\\nUmm, theres no actual article for ... \n",
91
+ "159569 fff125370e4aaaf3 And it looks like it was actually you who put ... \n",
92
+ "159570 fff46fc426af1f9a \"\\nAnd ... I really don't think you understand... \n",
93
+ "\n",
94
+ " toxic severe_toxic obscene threat insult identity_hate \n",
95
+ "0 0 0 0 0 0 0 \n",
96
+ "1 0 0 0 0 0 0 \n",
97
+ "2 0 0 0 0 0 0 \n",
98
+ "3 0 0 0 0 0 0 \n",
99
+ "4 0 0 0 0 0 0 \n",
100
+ "... ... ... ... ... ... ... \n",
101
+ "159566 0 0 0 0 0 0 \n",
102
+ "159567 0 0 0 0 0 0 \n",
103
+ "159568 0 0 0 0 0 0 \n",
104
+ "159569 0 0 0 0 0 0 \n",
105
+ "159570 0 0 0 0 0 0 \n",
106
+ "\n",
107
+ "[159571 rows x 8 columns]"
108
+ ],
109
+ "text/html": [
110
+ "\n",
111
+ " <div id=\"df-701bc9b7-727d-42d9-9139-de748ebf4501\">\n",
112
+ " <div class=\"colab-df-container\">\n",
113
+ " <div>\n",
114
+ "<style scoped>\n",
115
+ " .dataframe tbody tr th:only-of-type {\n",
116
+ " vertical-align: middle;\n",
117
+ " }\n",
118
+ "\n",
119
+ " .dataframe tbody tr th {\n",
120
+ " vertical-align: top;\n",
121
+ " }\n",
122
+ "\n",
123
+ " .dataframe thead th {\n",
124
+ " text-align: right;\n",
125
+ " }\n",
126
+ "</style>\n",
127
+ "<table border=\"1\" class=\"dataframe\">\n",
128
+ " <thead>\n",
129
+ " <tr style=\"text-align: right;\">\n",
130
+ " <th></th>\n",
131
+ " <th>id</th>\n",
132
+ " <th>comment_text</th>\n",
133
+ " <th>toxic</th>\n",
134
+ " <th>severe_toxic</th>\n",
135
+ " <th>obscene</th>\n",
136
+ " <th>threat</th>\n",
137
+ " <th>insult</th>\n",
138
+ " <th>identity_hate</th>\n",
139
+ " </tr>\n",
140
+ " </thead>\n",
141
+ " <tbody>\n",
142
+ " <tr>\n",
143
+ " <th>0</th>\n",
144
+ " <td>0000997932d777bf</td>\n",
145
+ " <td>Explanation\\nWhy the edits made under my usern...</td>\n",
146
+ " <td>0</td>\n",
147
+ " <td>0</td>\n",
148
+ " <td>0</td>\n",
149
+ " <td>0</td>\n",
150
+ " <td>0</td>\n",
151
+ " <td>0</td>\n",
152
+ " </tr>\n",
153
+ " <tr>\n",
154
+ " <th>1</th>\n",
155
+ " <td>000103f0d9cfb60f</td>\n",
156
+ " <td>D'aww! He matches this background colour I'm s...</td>\n",
157
+ " <td>0</td>\n",
158
+ " <td>0</td>\n",
159
+ " <td>0</td>\n",
160
+ " <td>0</td>\n",
161
+ " <td>0</td>\n",
162
+ " <td>0</td>\n",
163
+ " </tr>\n",
164
+ " <tr>\n",
165
+ " <th>2</th>\n",
166
+ " <td>000113f07ec002fd</td>\n",
167
+ " <td>Hey man, I'm really not trying to edit war. It...</td>\n",
168
+ " <td>0</td>\n",
169
+ " <td>0</td>\n",
170
+ " <td>0</td>\n",
171
+ " <td>0</td>\n",
172
+ " <td>0</td>\n",
173
+ " <td>0</td>\n",
174
+ " </tr>\n",
175
+ " <tr>\n",
176
+ " <th>3</th>\n",
177
+ " <td>0001b41b1c6bb37e</td>\n",
178
+ " <td>\"\\nMore\\nI can't make any real suggestions on ...</td>\n",
179
+ " <td>0</td>\n",
180
+ " <td>0</td>\n",
181
+ " <td>0</td>\n",
182
+ " <td>0</td>\n",
183
+ " <td>0</td>\n",
184
+ " <td>0</td>\n",
185
+ " </tr>\n",
186
+ " <tr>\n",
187
+ " <th>4</th>\n",
188
+ " <td>0001d958c54c6e35</td>\n",
189
+ " <td>You, sir, are my hero. Any chance you remember...</td>\n",
190
+ " <td>0</td>\n",
191
+ " <td>0</td>\n",
192
+ " <td>0</td>\n",
193
+ " <td>0</td>\n",
194
+ " <td>0</td>\n",
195
+ " <td>0</td>\n",
196
+ " </tr>\n",
197
+ " <tr>\n",
198
+ " <th>...</th>\n",
199
+ " <td>...</td>\n",
200
+ " <td>...</td>\n",
201
+ " <td>...</td>\n",
202
+ " <td>...</td>\n",
203
+ " <td>...</td>\n",
204
+ " <td>...</td>\n",
205
+ " <td>...</td>\n",
206
+ " <td>...</td>\n",
207
+ " </tr>\n",
208
+ " <tr>\n",
209
+ " <th>159566</th>\n",
210
+ " <td>ffe987279560d7ff</td>\n",
211
+ " <td>\":::::And for the second time of asking, when ...</td>\n",
212
+ " <td>0</td>\n",
213
+ " <td>0</td>\n",
214
+ " <td>0</td>\n",
215
+ " <td>0</td>\n",
216
+ " <td>0</td>\n",
217
+ " <td>0</td>\n",
218
+ " </tr>\n",
219
+ " <tr>\n",
220
+ " <th>159567</th>\n",
221
+ " <td>ffea4adeee384e90</td>\n",
222
+ " <td>You should be ashamed of yourself \\n\\nThat is ...</td>\n",
223
+ " <td>0</td>\n",
224
+ " <td>0</td>\n",
225
+ " <td>0</td>\n",
226
+ " <td>0</td>\n",
227
+ " <td>0</td>\n",
228
+ " <td>0</td>\n",
229
+ " </tr>\n",
230
+ " <tr>\n",
231
+ " <th>159568</th>\n",
232
+ " <td>ffee36eab5c267c9</td>\n",
233
+ " <td>Spitzer \\n\\nUmm, theres no actual article for ...</td>\n",
234
+ " <td>0</td>\n",
235
+ " <td>0</td>\n",
236
+ " <td>0</td>\n",
237
+ " <td>0</td>\n",
238
+ " <td>0</td>\n",
239
+ " <td>0</td>\n",
240
+ " </tr>\n",
241
+ " <tr>\n",
242
+ " <th>159569</th>\n",
243
+ " <td>fff125370e4aaaf3</td>\n",
244
+ " <td>And it looks like it was actually you who put ...</td>\n",
245
+ " <td>0</td>\n",
246
+ " <td>0</td>\n",
247
+ " <td>0</td>\n",
248
+ " <td>0</td>\n",
249
+ " <td>0</td>\n",
250
+ " <td>0</td>\n",
251
+ " </tr>\n",
252
+ " <tr>\n",
253
+ " <th>159570</th>\n",
254
+ " <td>fff46fc426af1f9a</td>\n",
255
+ " <td>\"\\nAnd ... I really don't think you understand...</td>\n",
256
+ " <td>0</td>\n",
257
+ " <td>0</td>\n",
258
+ " <td>0</td>\n",
259
+ " <td>0</td>\n",
260
+ " <td>0</td>\n",
261
+ " <td>0</td>\n",
262
+ " </tr>\n",
263
+ " </tbody>\n",
264
+ "</table>\n",
265
+ "<p>159571 rows × 8 columns</p>\n",
266
+ "</div>\n",
267
+ " <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-701bc9b7-727d-42d9-9139-de748ebf4501')\"\n",
268
+ " title=\"Convert this dataframe to an interactive table.\"\n",
269
+ " style=\"display:none;\">\n",
270
+ " \n",
271
+ " <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
272
+ " width=\"24px\">\n",
273
+ " <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
274
+ " <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
275
+ " </svg>\n",
276
+ " </button>\n",
277
+ " \n",
278
+ " <style>\n",
279
+ " .colab-df-container {\n",
280
+ " display:flex;\n",
281
+ " flex-wrap:wrap;\n",
282
+ " gap: 12px;\n",
283
+ " }\n",
284
+ "\n",
285
+ " .colab-df-convert {\n",
286
+ " background-color: #E8F0FE;\n",
287
+ " border: none;\n",
288
+ " border-radius: 50%;\n",
289
+ " cursor: pointer;\n",
290
+ " display: none;\n",
291
+ " fill: #1967D2;\n",
292
+ " height: 32px;\n",
293
+ " padding: 0 0 0 0;\n",
294
+ " width: 32px;\n",
295
+ " }\n",
296
+ "\n",
297
+ " .colab-df-convert:hover {\n",
298
+ " background-color: #E2EBFA;\n",
299
+ " box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
300
+ " fill: #174EA6;\n",
301
+ " }\n",
302
+ "\n",
303
+ " [theme=dark] .colab-df-convert {\n",
304
+ " background-color: #3B4455;\n",
305
+ " fill: #D2E3FC;\n",
306
+ " }\n",
307
+ "\n",
308
+ " [theme=dark] .colab-df-convert:hover {\n",
309
+ " background-color: #434B5C;\n",
310
+ " box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
311
+ " filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
312
+ " fill: #FFFFFF;\n",
313
+ " }\n",
314
+ " </style>\n",
315
+ "\n",
316
+ " <script>\n",
317
+ " const buttonEl =\n",
318
+ " document.querySelector('#df-701bc9b7-727d-42d9-9139-de748ebf4501 button.colab-df-convert');\n",
319
+ " buttonEl.style.display =\n",
320
+ " google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
321
+ "\n",
322
+ " async function convertToInteractive(key) {\n",
323
+ " const element = document.querySelector('#df-701bc9b7-727d-42d9-9139-de748ebf4501');\n",
324
+ " const dataTable =\n",
325
+ " await google.colab.kernel.invokeFunction('convertToInteractive',\n",
326
+ " [key], {});\n",
327
+ " if (!dataTable) return;\n",
328
+ "\n",
329
+ " const docLinkHtml = 'Like what you see? Visit the ' +\n",
330
+ " '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
331
+ " + ' to learn more about interactive tables.';\n",
332
+ " element.innerHTML = '';\n",
333
+ " dataTable['output_type'] = 'display_data';\n",
334
+ " await google.colab.output.renderOutput(dataTable, element);\n",
335
+ " const docLink = document.createElement('div');\n",
336
+ " docLink.innerHTML = docLinkHtml;\n",
337
+ " element.appendChild(docLink);\n",
338
+ " }\n",
339
+ " </script>\n",
340
+ " </div>\n",
341
+ " </div>\n",
342
+ " "
343
+ ]
344
+ },
345
+ "metadata": {},
346
+ "execution_count": 4
347
+ }
348
+ ],
349
+ "source": [
350
+ "df"
351
+ ],
352
+ "id": "1be479a4"
353
+ },
354
+ {
355
+ "cell_type": "markdown",
356
+ "metadata": {
357
+ "id": "e352d92f"
358
+ },
359
+ "source": [
360
+ "# 2. Preprocessing"
361
+ ],
362
+ "id": "e352d92f"
363
+ },
364
+ {
365
+ "cell_type": "markdown",
366
+ "metadata": {
367
+ "id": "dc5fe893"
368
+ },
369
+ "source": [
370
+ "## 2.1. Data overview"
371
+ ],
372
+ "id": "dc5fe893"
373
+ },
374
+ {
375
+ "cell_type": "code",
376
+ "execution_count": null,
377
+ "metadata": {
378
+ "colab": {
379
+ "base_uri": "https://localhost:8080/",
380
+ "height": 424
381
+ },
382
+ "id": "ea6fd11e",
383
+ "outputId": "adb8a890-565d-4e5b-da14-d7f11db89735"
384
+ },
385
+ "outputs": [
386
+ {
387
+ "output_type": "execute_result",
388
+ "data": {
389
+ "text/plain": [
390
+ " toxic severe_toxic obscene threat insult identity_hate\n",
391
+ "0 0 0 0 0 0 0\n",
392
+ "1 0 0 0 0 0 0\n",
393
+ "2 0 0 0 0 0 0\n",
394
+ "3 0 0 0 0 0 0\n",
395
+ "4 0 0 0 0 0 0\n",
396
+ "... ... ... ... ... ... ...\n",
397
+ "159566 0 0 0 0 0 0\n",
398
+ "159567 0 0 0 0 0 0\n",
399
+ "159568 0 0 0 0 0 0\n",
400
+ "159569 0 0 0 0 0 0\n",
401
+ "159570 0 0 0 0 0 0\n",
402
+ "\n",
403
+ "[159571 rows x 6 columns]"
404
+ ],
405
+ "text/html": [
406
+ "\n",
407
+ " <div id=\"df-30c83a4a-ec86-4758-82b4-5a5e6df88b01\">\n",
408
+ " <div class=\"colab-df-container\">\n",
409
+ " <div>\n",
410
+ "<style scoped>\n",
411
+ " .dataframe tbody tr th:only-of-type {\n",
412
+ " vertical-align: middle;\n",
413
+ " }\n",
414
+ "\n",
415
+ " .dataframe tbody tr th {\n",
416
+ " vertical-align: top;\n",
417
+ " }\n",
418
+ "\n",
419
+ " .dataframe thead th {\n",
420
+ " text-align: right;\n",
421
+ " }\n",
422
+ "</style>\n",
423
+ "<table border=\"1\" class=\"dataframe\">\n",
424
+ " <thead>\n",
425
+ " <tr style=\"text-align: right;\">\n",
426
+ " <th></th>\n",
427
+ " <th>toxic</th>\n",
428
+ " <th>severe_toxic</th>\n",
429
+ " <th>obscene</th>\n",
430
+ " <th>threat</th>\n",
431
+ " <th>insult</th>\n",
432
+ " <th>identity_hate</th>\n",
433
+ " </tr>\n",
434
+ " </thead>\n",
435
+ " <tbody>\n",
436
+ " <tr>\n",
437
+ " <th>0</th>\n",
438
+ " <td>0</td>\n",
439
+ " <td>0</td>\n",
440
+ " <td>0</td>\n",
441
+ " <td>0</td>\n",
442
+ " <td>0</td>\n",
443
+ " <td>0</td>\n",
444
+ " </tr>\n",
445
+ " <tr>\n",
446
+ " <th>1</th>\n",
447
+ " <td>0</td>\n",
448
+ " <td>0</td>\n",
449
+ " <td>0</td>\n",
450
+ " <td>0</td>\n",
451
+ " <td>0</td>\n",
452
+ " <td>0</td>\n",
453
+ " </tr>\n",
454
+ " <tr>\n",
455
+ " <th>2</th>\n",
456
+ " <td>0</td>\n",
457
+ " <td>0</td>\n",
458
+ " <td>0</td>\n",
459
+ " <td>0</td>\n",
460
+ " <td>0</td>\n",
461
+ " <td>0</td>\n",
462
+ " </tr>\n",
463
+ " <tr>\n",
464
+ " <th>3</th>\n",
465
+ " <td>0</td>\n",
466
+ " <td>0</td>\n",
467
+ " <td>0</td>\n",
468
+ " <td>0</td>\n",
469
+ " <td>0</td>\n",
470
+ " <td>0</td>\n",
471
+ " </tr>\n",
472
+ " <tr>\n",
473
+ " <th>4</th>\n",
474
+ " <td>0</td>\n",
475
+ " <td>0</td>\n",
476
+ " <td>0</td>\n",
477
+ " <td>0</td>\n",
478
+ " <td>0</td>\n",
479
+ " <td>0</td>\n",
480
+ " </tr>\n",
481
+ " <tr>\n",
482
+ " <th>...</th>\n",
483
+ " <td>...</td>\n",
484
+ " <td>...</td>\n",
485
+ " <td>...</td>\n",
486
+ " <td>...</td>\n",
487
+ " <td>...</td>\n",
488
+ " <td>...</td>\n",
489
+ " </tr>\n",
490
+ " <tr>\n",
491
+ " <th>159566</th>\n",
492
+ " <td>0</td>\n",
493
+ " <td>0</td>\n",
494
+ " <td>0</td>\n",
495
+ " <td>0</td>\n",
496
+ " <td>0</td>\n",
497
+ " <td>0</td>\n",
498
+ " </tr>\n",
499
+ " <tr>\n",
500
+ " <th>159567</th>\n",
501
+ " <td>0</td>\n",
502
+ " <td>0</td>\n",
503
+ " <td>0</td>\n",
504
+ " <td>0</td>\n",
505
+ " <td>0</td>\n",
506
+ " <td>0</td>\n",
507
+ " </tr>\n",
508
+ " <tr>\n",
509
+ " <th>159568</th>\n",
510
+ " <td>0</td>\n",
511
+ " <td>0</td>\n",
512
+ " <td>0</td>\n",
513
+ " <td>0</td>\n",
514
+ " <td>0</td>\n",
515
+ " <td>0</td>\n",
516
+ " </tr>\n",
517
+ " <tr>\n",
518
+ " <th>159569</th>\n",
519
+ " <td>0</td>\n",
520
+ " <td>0</td>\n",
521
+ " <td>0</td>\n",
522
+ " <td>0</td>\n",
523
+ " <td>0</td>\n",
524
+ " <td>0</td>\n",
525
+ " </tr>\n",
526
+ " <tr>\n",
527
+ " <th>159570</th>\n",
528
+ " <td>0</td>\n",
529
+ " <td>0</td>\n",
530
+ " <td>0</td>\n",
531
+ " <td>0</td>\n",
532
+ " <td>0</td>\n",
533
+ " <td>0</td>\n",
534
+ " </tr>\n",
535
+ " </tbody>\n",
536
+ "</table>\n",
537
+ "<p>159571 rows × 6 columns</p>\n",
538
+ "</div>\n",
539
+ " <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-30c83a4a-ec86-4758-82b4-5a5e6df88b01')\"\n",
540
+ " title=\"Convert this dataframe to an interactive table.\"\n",
541
+ " style=\"display:none;\">\n",
542
+ " \n",
543
+ " <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
544
+ " width=\"24px\">\n",
545
+ " <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
546
+ " <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
547
+ " </svg>\n",
548
+ " </button>\n",
549
+ " \n",
550
+ " <style>\n",
551
+ " .colab-df-container {\n",
552
+ " display:flex;\n",
553
+ " flex-wrap:wrap;\n",
554
+ " gap: 12px;\n",
555
+ " }\n",
556
+ "\n",
557
+ " .colab-df-convert {\n",
558
+ " background-color: #E8F0FE;\n",
559
+ " border: none;\n",
560
+ " border-radius: 50%;\n",
561
+ " cursor: pointer;\n",
562
+ " display: none;\n",
563
+ " fill: #1967D2;\n",
564
+ " height: 32px;\n",
565
+ " padding: 0 0 0 0;\n",
566
+ " width: 32px;\n",
567
+ " }\n",
568
+ "\n",
569
+ " .colab-df-convert:hover {\n",
570
+ " background-color: #E2EBFA;\n",
571
+ " box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
572
+ " fill: #174EA6;\n",
573
+ " }\n",
574
+ "\n",
575
+ " [theme=dark] .colab-df-convert {\n",
576
+ " background-color: #3B4455;\n",
577
+ " fill: #D2E3FC;\n",
578
+ " }\n",
579
+ "\n",
580
+ " [theme=dark] .colab-df-convert:hover {\n",
581
+ " background-color: #434B5C;\n",
582
+ " box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
583
+ " filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
584
+ " fill: #FFFFFF;\n",
585
+ " }\n",
586
+ " </style>\n",
587
+ "\n",
588
+ " <script>\n",
589
+ " const buttonEl =\n",
590
+ " document.querySelector('#df-30c83a4a-ec86-4758-82b4-5a5e6df88b01 button.colab-df-convert');\n",
591
+ " buttonEl.style.display =\n",
592
+ " google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
593
+ "\n",
594
+ " async function convertToInteractive(key) {\n",
595
+ " const element = document.querySelector('#df-30c83a4a-ec86-4758-82b4-5a5e6df88b01');\n",
596
+ " const dataTable =\n",
597
+ " await google.colab.kernel.invokeFunction('convertToInteractive',\n",
598
+ " [key], {});\n",
599
+ " if (!dataTable) return;\n",
600
+ "\n",
601
+ " const docLinkHtml = 'Like what you see? Visit the ' +\n",
602
+ " '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
603
+ " + ' to learn more about interactive tables.';\n",
604
+ " element.innerHTML = '';\n",
605
+ " dataTable['output_type'] = 'display_data';\n",
606
+ " await google.colab.output.renderOutput(dataTable, element);\n",
607
+ " const docLink = document.createElement('div');\n",
608
+ " docLink.innerHTML = docLinkHtml;\n",
609
+ " element.appendChild(docLink);\n",
610
+ " }\n",
611
+ " </script>\n",
612
+ " </div>\n",
613
+ " </div>\n",
614
+ " "
615
+ ]
616
+ },
617
+ "metadata": {},
618
+ "execution_count": 5
619
+ }
620
+ ],
621
+ "source": [
622
+ "df[df.columns[2:]]"
623
+ ],
624
+ "id": "ea6fd11e"
625
+ },
626
+ {
627
+ "cell_type": "code",
628
+ "execution_count": null,
629
+ "metadata": {
630
+ "colab": {
631
+ "base_uri": "https://localhost:8080/",
632
+ "height": 389
633
+ },
634
+ "id": "7eb94a81",
635
+ "outputId": "c765800b-02a7-4e91-fe92-ae13b8d943ba"
636
+ },
637
+ "outputs": [
638
+ {
639
+ "output_type": "execute_result",
640
+ "data": {
641
+ "text/plain": [
642
+ " id comment_text \\\n",
643
+ "6 0002bcb3da6cb337 COCKSUCKER BEFORE YOU PISS AROUND ON MY WORK \n",
644
+ "12 0005c987bdfc9d4b Hey... what is it..\\n@ | talk .\\nWhat is it...... \n",
645
+ "16 0007e25b2121310b Bye! \\n\\nDon't look, come or think of comming ... \n",
646
+ "42 001810bf8c45bf5f You are gay or antisemmitian? \\n\\nArchangel WH... \n",
647
+ "43 00190820581d90ce FUCK YOUR FILTHY MOTHER IN THE ASS, DRY! \n",
648
+ "\n",
649
+ " toxic severe_toxic obscene threat insult identity_hate \n",
650
+ "6 1 1 1 0 1 0 \n",
651
+ "12 1 0 0 0 0 0 \n",
652
+ "16 1 0 0 0 0 0 \n",
653
+ "42 1 0 1 0 1 1 \n",
654
+ "43 1 0 1 0 1 0 "
655
+ ],
656
+ "text/html": [
657
+ "\n",
658
+ " <div id=\"df-d62ae80e-064e-4795-9ce5-c8cc1659ce62\">\n",
659
+ " <div class=\"colab-df-container\">\n",
660
+ " <div>\n",
661
+ "<style scoped>\n",
662
+ " .dataframe tbody tr th:only-of-type {\n",
663
+ " vertical-align: middle;\n",
664
+ " }\n",
665
+ "\n",
666
+ " .dataframe tbody tr th {\n",
667
+ " vertical-align: top;\n",
668
+ " }\n",
669
+ "\n",
670
+ " .dataframe thead th {\n",
671
+ " text-align: right;\n",
672
+ " }\n",
673
+ "</style>\n",
674
+ "<table border=\"1\" class=\"dataframe\">\n",
675
+ " <thead>\n",
676
+ " <tr style=\"text-align: right;\">\n",
677
+ " <th></th>\n",
678
+ " <th>id</th>\n",
679
+ " <th>comment_text</th>\n",
680
+ " <th>toxic</th>\n",
681
+ " <th>severe_toxic</th>\n",
682
+ " <th>obscene</th>\n",
683
+ " <th>threat</th>\n",
684
+ " <th>insult</th>\n",
685
+ " <th>identity_hate</th>\n",
686
+ " </tr>\n",
687
+ " </thead>\n",
688
+ " <tbody>\n",
689
+ " <tr>\n",
690
+ " <th>6</th>\n",
691
+ " <td>0002bcb3da6cb337</td>\n",
692
+ " <td>COCKSUCKER BEFORE YOU PISS AROUND ON MY WORK</td>\n",
693
+ " <td>1</td>\n",
694
+ " <td>1</td>\n",
695
+ " <td>1</td>\n",
696
+ " <td>0</td>\n",
697
+ " <td>1</td>\n",
698
+ " <td>0</td>\n",
699
+ " </tr>\n",
700
+ " <tr>\n",
701
+ " <th>12</th>\n",
702
+ " <td>0005c987bdfc9d4b</td>\n",
703
+ " <td>Hey... what is it..\\n@ | talk .\\nWhat is it......</td>\n",
704
+ " <td>1</td>\n",
705
+ " <td>0</td>\n",
706
+ " <td>0</td>\n",
707
+ " <td>0</td>\n",
708
+ " <td>0</td>\n",
709
+ " <td>0</td>\n",
710
+ " </tr>\n",
711
+ " <tr>\n",
712
+ " <th>16</th>\n",
713
+ " <td>0007e25b2121310b</td>\n",
714
+ " <td>Bye! \\n\\nDon't look, come or think of comming ...</td>\n",
715
+ " <td>1</td>\n",
716
+ " <td>0</td>\n",
717
+ " <td>0</td>\n",
718
+ " <td>0</td>\n",
719
+ " <td>0</td>\n",
720
+ " <td>0</td>\n",
721
+ " </tr>\n",
722
+ " <tr>\n",
723
+ " <th>42</th>\n",
724
+ " <td>001810bf8c45bf5f</td>\n",
725
+ " <td>You are gay or antisemmitian? \\n\\nArchangel WH...</td>\n",
726
+ " <td>1</td>\n",
727
+ " <td>0</td>\n",
728
+ " <td>1</td>\n",
729
+ " <td>0</td>\n",
730
+ " <td>1</td>\n",
731
+ " <td>1</td>\n",
732
+ " </tr>\n",
733
+ " <tr>\n",
734
+ " <th>43</th>\n",
735
+ " <td>00190820581d90ce</td>\n",
736
+ " <td>FUCK YOUR FILTHY MOTHER IN THE ASS, DRY!</td>\n",
737
+ " <td>1</td>\n",
738
+ " <td>0</td>\n",
739
+ " <td>1</td>\n",
740
+ " <td>0</td>\n",
741
+ " <td>1</td>\n",
742
+ " <td>0</td>\n",
743
+ " </tr>\n",
744
+ " </tbody>\n",
745
+ "</table>\n",
746
+ "</div>\n",
747
+ " <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-d62ae80e-064e-4795-9ce5-c8cc1659ce62')\"\n",
748
+ " title=\"Convert this dataframe to an interactive table.\"\n",
749
+ " style=\"display:none;\">\n",
750
+ " \n",
751
+ " <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
752
+ " width=\"24px\">\n",
753
+ " <path d=\"M0 0h24v24H0V0z\" fill=\"none\"/>\n",
754
+ " <path d=\"M18.56 5.44l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94zm-11 1L8.5 8.5l.94-2.06 2.06-.94-2.06-.94L8.5 2.5l-.94 2.06-2.06.94zm10 10l.94 2.06.94-2.06 2.06-.94-2.06-.94-.94-2.06-.94 2.06-2.06.94z\"/><path d=\"M17.41 7.96l-1.37-1.37c-.4-.4-.92-.59-1.43-.59-.52 0-1.04.2-1.43.59L10.3 9.45l-7.72 7.72c-.78.78-.78 2.05 0 2.83L4 21.41c.39.39.9.59 1.41.59.51 0 1.02-.2 1.41-.59l7.78-7.78 2.81-2.81c.8-.78.8-2.07 0-2.86zM5.41 20L4 18.59l7.72-7.72 1.47 1.35L5.41 20z\"/>\n",
755
+ " </svg>\n",
756
+ " </button>\n",
757
+ " \n",
758
+ " <style>\n",
759
+ " .colab-df-container {\n",
760
+ " display:flex;\n",
761
+ " flex-wrap:wrap;\n",
762
+ " gap: 12px;\n",
763
+ " }\n",
764
+ "\n",
765
+ " .colab-df-convert {\n",
766
+ " background-color: #E8F0FE;\n",
767
+ " border: none;\n",
768
+ " border-radius: 50%;\n",
769
+ " cursor: pointer;\n",
770
+ " display: none;\n",
771
+ " fill: #1967D2;\n",
772
+ " height: 32px;\n",
773
+ " padding: 0 0 0 0;\n",
774
+ " width: 32px;\n",
775
+ " }\n",
776
+ "\n",
777
+ " .colab-df-convert:hover {\n",
778
+ " background-color: #E2EBFA;\n",
779
+ " box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
780
+ " fill: #174EA6;\n",
781
+ " }\n",
782
+ "\n",
783
+ " [theme=dark] .colab-df-convert {\n",
784
+ " background-color: #3B4455;\n",
785
+ " fill: #D2E3FC;\n",
786
+ " }\n",
787
+ "\n",
788
+ " [theme=dark] .colab-df-convert:hover {\n",
789
+ " background-color: #434B5C;\n",
790
+ " box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
791
+ " filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
792
+ " fill: #FFFFFF;\n",
793
+ " }\n",
794
+ " </style>\n",
795
+ "\n",
796
+ " <script>\n",
797
+ " const buttonEl =\n",
798
+ " document.querySelector('#df-d62ae80e-064e-4795-9ce5-c8cc1659ce62 button.colab-df-convert');\n",
799
+ " buttonEl.style.display =\n",
800
+ " google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
801
+ "\n",
802
+ " async function convertToInteractive(key) {\n",
803
+ " const element = document.querySelector('#df-d62ae80e-064e-4795-9ce5-c8cc1659ce62');\n",
804
+ " const dataTable =\n",
805
+ " await google.colab.kernel.invokeFunction('convertToInteractive',\n",
806
+ " [key], {});\n",
807
+ " if (!dataTable) return;\n",
808
+ "\n",
809
+ " const docLinkHtml = 'Like what you see? Visit the ' +\n",
810
+ " '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
811
+ " + ' to learn more about interactive tables.';\n",
812
+ " element.innerHTML = '';\n",
813
+ " dataTable['output_type'] = 'display_data';\n",
814
+ " await google.colab.output.renderOutput(dataTable, element);\n",
815
+ " const docLink = document.createElement('div');\n",
816
+ " docLink.innerHTML = docLinkHtml;\n",
817
+ " element.appendChild(docLink);\n",
818
+ " }\n",
819
+ " </script>\n",
820
+ " </div>\n",
821
+ " </div>\n",
822
+ " "
823
+ ]
824
+ },
825
+ "metadata": {},
826
+ "execution_count": 6
827
+ }
828
+ ],
829
+ "source": [
830
+ "df.loc[df.iloc[:, 2]==1].head()"
831
+ ],
832
+ "id": "7eb94a81"
833
+ },
834
+ {
835
+ "cell_type": "code",
836
+ "execution_count": null,
837
+ "metadata": {
838
+ "colab": {
839
+ "base_uri": "https://localhost:8080/",
840
+ "height": 87
841
+ },
842
+ "id": "2bb35d57",
843
+ "outputId": "c9531968-a5f4-4348-a833-4a366ee59010"
844
+ },
845
+ "outputs": [
846
+ {
847
+ "output_type": "execute_result",
848
+ "data": {
849
+ "text/plain": [
850
+ "'Hey... what is it..\\n@ | talk .\\nWhat is it... an exclusive group of some WP TALIBANS...who are good at destroying, self-appointed purist who GANG UP any one who asks them questions abt their ANTI-SOCIAL and DESTRUCTIVE (non)-contribution at WP?\\n\\nAsk Sityush to clean up his behavior than issue me nonsensical warnings...'"
851
+ ],
852
+ "application/vnd.google.colaboratory.intrinsic+json": {
853
+ "type": "string"
854
+ }
855
+ },
856
+ "metadata": {},
857
+ "execution_count": 7
858
+ }
859
+ ],
860
+ "source": [
861
+ "df.iloc[12].comment_text"
862
+ ],
863
+ "id": "2bb35d57"
864
+ },
865
+ {
866
+ "cell_type": "markdown",
867
+ "metadata": {
868
+ "id": "1fdd25c4"
869
+ },
870
+ "source": [
871
+ "## 2.2. Data preprocessing"
872
+ ],
873
+ "id": "1fdd25c4"
874
+ },
875
+ {
876
+ "cell_type": "code",
877
+ "execution_count": null,
878
+ "metadata": {
879
+ "id": "c8bd9d59"
880
+ },
881
+ "outputs": [],
882
+ "source": [
883
+ "from tensorflow.keras.layers import TextVectorization"
884
+ ],
885
+ "id": "c8bd9d59"
886
+ },
887
+ {
888
+ "cell_type": "code",
889
+ "execution_count": null,
890
+ "metadata": {
891
+ "colab": {
892
+ "base_uri": "https://localhost:8080/"
893
+ },
894
+ "id": "b8c03840",
895
+ "outputId": "d16ec2c2-b1d1-4956-a11b-6f8e1960a5b8"
896
+ },
897
+ "outputs": [
898
+ {
899
+ "output_type": "execute_result",
900
+ "data": {
901
+ "text/plain": [
902
+ "Index(['id', 'comment_text', 'toxic', 'severe_toxic', 'obscene', 'threat',\n",
903
+ " 'insult', 'identity_hate'],\n",
904
+ " dtype='object')"
905
+ ]
906
+ },
907
+ "metadata": {},
908
+ "execution_count": 9
909
+ }
910
+ ],
911
+ "source": [
912
+ "df.columns"
913
+ ],
914
+ "id": "b8c03840"
915
+ },
916
+ {
917
+ "cell_type": "code",
918
+ "execution_count": null,
919
+ "metadata": {
920
+ "id": "2e64c456"
921
+ },
922
+ "outputs": [],
923
+ "source": [
924
+ "X = df.comment_text\n",
925
+ "y = df.iloc[:,2:].values"
926
+ ],
927
+ "id": "2e64c456"
928
+ },
929
+ {
930
+ "cell_type": "code",
931
+ "execution_count": null,
932
+ "metadata": {
933
+ "id": "c924ed65"
934
+ },
935
+ "outputs": [],
936
+ "source": [
937
+ "# number of words in vocab\n",
938
+ "MAX_VOCAB = 200000"
939
+ ],
940
+ "id": "c924ed65"
941
+ },
942
+ {
943
+ "cell_type": "code",
944
+ "execution_count": null,
945
+ "metadata": {
946
+ "id": "d9e74b26"
947
+ },
948
+ "outputs": [],
949
+ "source": [
950
+ "vectorizer = TextVectorization(max_tokens=MAX_VOCAB, \n",
951
+ " output_sequence_length=1800, \n",
952
+ " output_mode='int')"
953
+ ],
954
+ "id": "d9e74b26"
955
+ },
956
+ {
957
+ "cell_type": "code",
958
+ "execution_count": null,
959
+ "metadata": {
960
+ "id": "b89a019a"
961
+ },
962
+ "outputs": [],
963
+ "source": [
964
+ "vectorizer.adapt(X.values)"
965
+ ],
966
+ "id": "b89a019a"
967
+ },
968
+ {
969
+ "cell_type": "code",
970
+ "execution_count": null,
971
+ "metadata": {
972
+ "colab": {
973
+ "base_uri": "https://localhost:8080/"
974
+ },
975
+ "id": "832c78b5",
976
+ "outputId": "c5d8489f-1b6e-4bcc-e4e8-42359d85c4ce"
977
+ },
978
+ "outputs": [
979
+ {
980
+ "output_type": "execute_result",
981
+ "data": {
982
+ "text/plain": [
983
+ "<tf.Tensor: shape=(6,), dtype=int64, numpy=array([288, 263, 191, 3, 14, 463])>"
984
+ ]
985
+ },
986
+ "metadata": {},
987
+ "execution_count": 14
988
+ }
989
+ ],
990
+ "source": [
991
+ "vectorizer('Hello world, welcome to this project')[:6]"
992
+ ],
993
+ "id": "832c78b5"
994
+ },
995
+ {
996
+ "cell_type": "code",
997
+ "execution_count": null,
998
+ "metadata": {
999
+ "id": "d90fea8a"
1000
+ },
1001
+ "outputs": [],
1002
+ "source": [
1003
+ "processed_text = vectorizer(X.values)"
1004
+ ],
1005
+ "id": "d90fea8a"
1006
+ },
1007
+ {
1008
+ "cell_type": "code",
1009
+ "execution_count": null,
1010
+ "metadata": {
1011
+ "colab": {
1012
+ "base_uri": "https://localhost:8080/"
1013
+ },
1014
+ "id": "9891f1b3",
1015
+ "outputId": "16715f82-cc03-4bc9-bc4c-0d964111d0d3"
1016
+ },
1017
+ "outputs": [
1018
+ {
1019
+ "output_type": "execute_result",
1020
+ "data": {
1021
+ "text/plain": [
1022
+ "<tf.Tensor: shape=(159571, 1800), dtype=int64, numpy=\n",
1023
+ "array([[ 645, 76, 2, ..., 0, 0, 0],\n",
1024
+ " [ 1, 54, 2489, ..., 0, 0, 0],\n",
1025
+ " [ 425, 441, 70, ..., 0, 0, 0],\n",
1026
+ " ...,\n",
1027
+ " [32445, 7392, 383, ..., 0, 0, 0],\n",
1028
+ " [ 5, 12, 534, ..., 0, 0, 0],\n",
1029
+ " [ 5, 8, 130, ..., 0, 0, 0]])>"
1030
+ ]
1031
+ },
1032
+ "metadata": {},
1033
+ "execution_count": 16
1034
+ }
1035
+ ],
1036
+ "source": [
1037
+ "processed_text"
1038
+ ],
1039
+ "id": "9891f1b3"
1040
+ },
1041
+ {
1042
+ "cell_type": "code",
1043
+ "execution_count": null,
1044
+ "metadata": {
1045
+ "id": "9176a3c0"
1046
+ },
1047
+ "outputs": [],
1048
+ "source": [
1049
+ "# MCSHBAP - map, cache, shuffle, batch, prefetch\n",
1050
+ "# from_tensor_slices OR list_file\n",
1051
+ "data = tf.data.Dataset.from_tensor_slices((processed_text, y))\n",
1052
+ "data = data.cache()\n",
1053
+ "data = data.shuffle(160000)\n",
1054
+ "data = data.batch(16)\n",
1055
+ "data = data.prefetch(8) # prevent bottleneck"
1056
+ ],
1057
+ "id": "9176a3c0"
1058
+ },
1059
+ {
1060
+ "cell_type": "code",
1061
+ "execution_count": null,
1062
+ "metadata": {
1063
+ "id": "042126d5"
1064
+ },
1065
+ "outputs": [],
1066
+ "source": [
1067
+ "batch_X, batch_y = data.as_numpy_iterator().next()"
1068
+ ],
1069
+ "id": "042126d5"
1070
+ },
1071
+ {
1072
+ "cell_type": "code",
1073
+ "execution_count": null,
1074
+ "metadata": {
1075
+ "colab": {
1076
+ "base_uri": "https://localhost:8080/"
1077
+ },
1078
+ "id": "5b73aea2",
1079
+ "outputId": "be6a586c-2d6e-459a-8748-ae4b4ec03125"
1080
+ },
1081
+ "outputs": [
1082
+ {
1083
+ "output_type": "execute_result",
1084
+ "data": {
1085
+ "text/plain": [
1086
+ "(16, 1800)"
1087
+ ]
1088
+ },
1089
+ "metadata": {},
1090
+ "execution_count": 19
1091
+ }
1092
+ ],
1093
+ "source": [
1094
+ "batch_X.shape"
1095
+ ],
1096
+ "id": "5b73aea2"
1097
+ },
1098
+ {
1099
+ "cell_type": "code",
1100
+ "execution_count": null,
1101
+ "metadata": {
1102
+ "id": "8286ce71"
1103
+ },
1104
+ "outputs": [],
1105
+ "source": [
1106
+ "train = data.take(int(len(data) * .7))\n",
1107
+ "val = data.skip(int(len(data) * .7)).take(int(len(data)*.2))\n",
1108
+ "test = data.take(int(len(data) * .9)).take(int(len(data)*.1))"
1109
+ ],
1110
+ "id": "8286ce71"
1111
+ },
1112
+ {
1113
+ "cell_type": "code",
1114
+ "execution_count": null,
1115
+ "metadata": {
1116
+ "colab": {
1117
+ "base_uri": "https://localhost:8080/"
1118
+ },
1119
+ "id": "f06e8067",
1120
+ "outputId": "8dadd560-5bfb-4d58-8301-d9f56f30a0b0"
1121
+ },
1122
+ "outputs": [
1123
+ {
1124
+ "output_type": "execute_result",
1125
+ "data": {
1126
+ "text/plain": [
1127
+ "6981"
1128
+ ]
1129
+ },
1130
+ "metadata": {},
1131
+ "execution_count": 21
1132
+ }
1133
+ ],
1134
+ "source": [
1135
+ "len(train)"
1136
+ ],
1137
+ "id": "f06e8067"
1138
+ },
1139
+ {
1140
+ "cell_type": "code",
1141
+ "execution_count": null,
1142
+ "metadata": {
1143
+ "colab": {
1144
+ "base_uri": "https://localhost:8080/"
1145
+ },
1146
+ "id": "74d5fb4e",
1147
+ "outputId": "7ddc0d55-360e-4283-a955-ce3dab49fc07"
1148
+ },
1149
+ "outputs": [
1150
+ {
1151
+ "output_type": "execute_result",
1152
+ "data": {
1153
+ "text/plain": [
1154
+ "(array([[ 5495, 51, 29, ..., 0, 0, 0],\n",
1155
+ " [ 33, 7, 69, ..., 0, 0, 0],\n",
1156
+ " [ 24, 1805, 2256, ..., 0, 0, 0],\n",
1157
+ " ...,\n",
1158
+ " [ 46, 1377, 31, ..., 0, 0, 0],\n",
1159
+ " [ 4354, 41514, 8, ..., 0, 0, 0],\n",
1160
+ " [ 215, 8, 477, ..., 0, 0, 0]]),\n",
1161
+ " array([[0, 0, 0, 0, 0, 0],\n",
1162
+ " [0, 0, 0, 0, 0, 0],\n",
1163
+ " [0, 0, 0, 0, 0, 0],\n",
1164
+ " [0, 0, 0, 0, 0, 0],\n",
1165
+ " [0, 0, 0, 0, 0, 0],\n",
1166
+ " [0, 0, 0, 0, 0, 0],\n",
1167
+ " [0, 0, 0, 0, 0, 0],\n",
1168
+ " [0, 0, 0, 0, 0, 0],\n",
1169
+ " [0, 0, 0, 0, 0, 0],\n",
1170
+ " [0, 0, 0, 0, 0, 0],\n",
1171
+ " [0, 0, 0, 0, 0, 0],\n",
1172
+ " [0, 0, 0, 0, 0, 0],\n",
1173
+ " [0, 0, 0, 0, 0, 0],\n",
1174
+ " [0, 0, 0, 0, 0, 0],\n",
1175
+ " [0, 0, 0, 0, 0, 0],\n",
1176
+ " [0, 0, 0, 0, 0, 0]]))"
1177
+ ]
1178
+ },
1179
+ "metadata": {},
1180
+ "execution_count": 22
1181
+ }
1182
+ ],
1183
+ "source": [
1184
+ "train.as_numpy_iterator().next()"
1185
+ ],
1186
+ "id": "74d5fb4e"
1187
+ },
1188
+ {
1189
+ "cell_type": "markdown",
1190
+ "metadata": {
1191
+ "id": "-8f_Bi-OAc03"
1192
+ },
1193
+ "source": [
1194
+ "# 3. Buiding model"
1195
+ ],
1196
+ "id": "-8f_Bi-OAc03"
1197
+ },
1198
+ {
1199
+ "cell_type": "code",
1200
+ "execution_count": null,
1201
+ "metadata": {
1202
+ "id": "ItiVy4-S1pK5"
1203
+ },
1204
+ "outputs": [],
1205
+ "source": [
1206
+ "from tensorflow.keras.models import Sequential\n",
1207
+ "from tensorflow.keras.layers import LSTM, Dropout, Bidirectional, Dense, Embedding"
1208
+ ],
1209
+ "id": "ItiVy4-S1pK5"
1210
+ },
1211
+ {
1212
+ "cell_type": "code",
1213
+ "execution_count": null,
1214
+ "metadata": {
1215
+ "id": "8U9TmSbxAvEw"
1216
+ },
1217
+ "outputs": [],
1218
+ "source": [
1219
+ "model = Sequential()\n",
1220
+ "model.add(Embedding(MAX_VOCAB + 1, 32))\n",
1221
+ "model.add(Bidirectional(LSTM(32, activation='tanh')))\n",
1222
+ "model.add(Dense(128, activation='relu'))\n",
1223
+ "model.add(Dense(256, activation='relu'))\n",
1224
+ "model.add(Dense(128, activation='relu'))\n",
1225
+ "model.add(Dense(64, activation='relu'))\n",
1226
+ "model.add(Dense(32, activation='relu'))\n",
1227
+ "model.add(Dense(6, activation='sigmoid'))"
1228
+ ],
1229
+ "id": "8U9TmSbxAvEw"
1230
+ },
1231
+ {
1232
+ "cell_type": "code",
1233
+ "execution_count": null,
1234
+ "metadata": {
1235
+ "id": "pF_pooL4CY91"
1236
+ },
1237
+ "outputs": [],
1238
+ "source": [
1239
+ "model.compile(loss='BinaryCrossentropy', optimizer='Adam')"
1240
+ ],
1241
+ "id": "pF_pooL4CY91"
1242
+ },
1243
+ {
1244
+ "cell_type": "code",
1245
+ "execution_count": null,
1246
+ "metadata": {
1247
+ "colab": {
1248
+ "base_uri": "https://localhost:8080/"
1249
+ },
1250
+ "id": "ZtPm1Gp2GJza",
1251
+ "outputId": "222a35f6-4ad4-4c16-a240-3a18c0392525"
1252
+ },
1253
+ "outputs": [
1254
+ {
1255
+ "output_type": "stream",
1256
+ "name": "stdout",
1257
+ "text": [
1258
+ "Model: \"sequential_2\"\n",
1259
+ "_________________________________________________________________\n",
1260
+ " Layer (type) Output Shape Param # \n",
1261
+ "=================================================================\n",
1262
+ " embedding_2 (Embedding) (None, None, 32) 6400032 \n",
1263
+ " \n",
1264
+ " bidirectional_2 (Bidirectio (None, 64) 16640 \n",
1265
+ " nal) \n",
1266
+ " \n",
1267
+ " dense_12 (Dense) (None, 128) 8320 \n",
1268
+ " \n",
1269
+ " dense_13 (Dense) (None, 256) 33024 \n",
1270
+ " \n",
1271
+ " dense_14 (Dense) (None, 128) 32896 \n",
1272
+ " \n",
1273
+ " dense_15 (Dense) (None, 64) 8256 \n",
1274
+ " \n",
1275
+ " dense_16 (Dense) (None, 32) 2080 \n",
1276
+ " \n",
1277
+ " dense_17 (Dense) (None, 6) 198 \n",
1278
+ " \n",
1279
+ "=================================================================\n",
1280
+ "Total params: 6,501,446\n",
1281
+ "Trainable params: 6,501,446\n",
1282
+ "Non-trainable params: 0\n",
1283
+ "_________________________________________________________________\n"
1284
+ ]
1285
+ }
1286
+ ],
1287
+ "source": [
1288
+ "model.summary()"
1289
+ ],
1290
+ "id": "ZtPm1Gp2GJza"
1291
+ },
1292
+ {
1293
+ "cell_type": "code",
1294
+ "execution_count": null,
1295
+ "metadata": {
1296
+ "colab": {
1297
+ "base_uri": "https://localhost:8080/"
1298
+ },
1299
+ "id": "Cu-uCQaEJpjK",
1300
+ "outputId": "dd00becf-d085-47d2-ad04-fc121471ebef"
1301
+ },
1302
+ "outputs": [
1303
+ {
1304
+ "output_type": "stream",
1305
+ "name": "stdout",
1306
+ "text": [
1307
+ "Epoch 1/10\n",
1308
+ "6981/6981 [==============================] - 642s 92ms/step - loss: 0.0645 - val_loss: 0.0441\n",
1309
+ "Epoch 2/10\n",
1310
+ "6981/6981 [==============================] - 639s 91ms/step - loss: 0.0458 - val_loss: 0.0398\n",
1311
+ "Epoch 3/10\n",
1312
+ "6981/6981 [==============================] - 660s 94ms/step - loss: 0.0412 - val_loss: 0.0366\n",
1313
+ "Epoch 4/10\n",
1314
+ "6981/6981 [==============================] - 639s 91ms/step - loss: 0.0371 - val_loss: 0.0335\n",
1315
+ "Epoch 5/10\n",
1316
+ "6981/6981 [==============================] - 648s 93ms/step - loss: 0.0335 - val_loss: 0.0297\n",
1317
+ "Epoch 6/10\n",
1318
+ "6981/6981 [==============================] - 634s 91ms/step - loss: 0.0307 - val_loss: 0.0261\n",
1319
+ "Epoch 7/10\n",
1320
+ "6981/6981 [==============================] - 634s 91ms/step - loss: 0.0278 - val_loss: 0.0254\n",
1321
+ "Epoch 8/10\n",
1322
+ "6981/6981 [==============================] - 634s 91ms/step - loss: 0.0252 - val_loss: 0.0231\n",
1323
+ "Epoch 9/10\n",
1324
+ "6981/6981 [==============================] - 623s 89ms/step - loss: 0.0234 - val_loss: 0.0193\n",
1325
+ "Epoch 10/10\n",
1326
+ "6981/6981 [==============================] - 627s 90ms/step - loss: 0.0214 - val_loss: 0.0197\n"
1327
+ ]
1328
+ }
1329
+ ],
1330
+ "source": [
1331
+ "history = model.fit(train, epochs=10, validation_data=val)"
1332
+ ],
1333
+ "id": "Cu-uCQaEJpjK"
1334
+ },
1335
+ {
1336
+ "cell_type": "code",
1337
+ "execution_count": null,
1338
+ "metadata": {
1339
+ "id": "Ylqg0nwFGPBL"
1340
+ },
1341
+ "outputs": [],
1342
+ "source": [
1343
+ "import matplotlib.pyplot as plt"
1344
+ ],
1345
+ "id": "Ylqg0nwFGPBL"
1346
+ },
1347
+ {
1348
+ "cell_type": "code",
1349
+ "execution_count": null,
1350
+ "metadata": {
1351
+ "id": "cD_u8JR4OYFL",
1352
+ "colab": {
1353
+ "base_uri": "https://localhost:8080/",
1354
+ "height": 282
1355
+ },
1356
+ "outputId": "61bab891-c2a2-44f4-afa4-c17e7aa37f05"
1357
+ },
1358
+ "outputs": [
1359
+ {
1360
+ "output_type": "display_data",
1361
+ "data": {
1362
+ "text/plain": [
1363
+ "<Figure size 576x360 with 0 Axes>"
1364
+ ]
1365
+ },
1366
+ "metadata": {}
1367
+ },
1368
+ {
1369
+ "output_type": "display_data",
1370
+ "data": {
1371
+ "text/plain": [
1372
+ "<Figure size 432x288 with 1 Axes>"
1373
+ ],
1374
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAD4CAYAAADiry33AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAgAElEQVR4nO3deXiU1fn/8ffJTkI2su8J+5KQBMImAq6ACwKi4oJbrbau/Vartb/aVq3d1Fb9tn6r1l2wQEEUFMG2IBEFJCsh7ASSTBazB0L2mfP74xk0IEsgyzOZ3K/ryuUsDzN35pLPHM5znnMrrTVCCCGcl4vZBQghhOhZEvRCCOHkJOiFEMLJSdALIYSTk6AXQggn52Z2AScLDg7W8fHxZpchhBB9SmZmZpXWOuRUzzlc0MfHx5ORkWF2GUII0acopQpP95xM3QghhJOToBdCCCcnQS+EEE7O4ebohRD9U1tbGxaLhebmZrNLcWheXl5ER0fj7u7e6T8jQS+EcAgWiwVfX1/i4+NRSpldjkPSWlNdXY3FYiEhIaHTf06mboQQDqG5uZmgoCAJ+TNQShEUFHTO/+qRoBdCOAwJ+bM7n8/IaYK+rL6JZz7eRc2xVrNLEUIIh+I0QX+0uZ3XNx/iXxnFZpcihOijBg4caHYJPcJpgn54mC8TEwaxZFsRNps0UxFCiOOcJugBFk2Oo6imkfT9lWaXIoTow7TWPProoyQmJpKUlMSyZcsAKCsrY/r06aSkpJCYmMgXX3yB1Wrljjvu+PbYF154weTqv8+pllfOHhNO8EAPFm8t5KIRoWaXI4Q4T0+tyWdX6ZFufc3RkX78Zs6YTh37wQcfkJOTQ25uLlVVVUyYMIHp06fz/vvvM2vWLH75y19itVppbGwkJyeHkpISdu7cCUBdXV231t0dnGpE7+HmwsIJMWzYU4GlttHscoQQfdTmzZu56aabcHV1JSwsjBkzZrB9+3YmTJjAW2+9xZNPPkleXh6+vr4MHjyYgoICHnzwQdatW4efn5/Z5X+PU43oAW6aGMvfPz/IP78u4tFZI80uRwhxHjo78u5t06dPJz09nU8++YQ77riDhx9+mNtuu43c3FzWr1/PK6+8wvLly3nzzTfNLvUETjWiB4gO9OaSkaEs215Ma7vN7HKEEH3QtGnTWLZsGVarlcrKStLT05k4cSKFhYWEhYVx991388Mf/pCsrCyqqqqw2WwsWLCAZ555hqysLLPL/x6nG9GDcVL2P7srWJdfzjXJkWaXI4ToY+bPn8+WLVtITk5GKcWzzz5LeHg477zzDs899xzu7u4MHDiQd999l5KSEu68805sNmNg+Yc//MHk6r9Pae1YSxHT0tJ0VxuP2Gyai57/nHB/L5b/aEo3VSaE6Em7d+9m1KhRZpfRJ5zqs1JKZWqt0051vNNN3QC4uChumRTL14dq2Ft+1OxyhBDCVE4Z9ADXp8Xg4ebCkm2n7a4lhBD9gtMG/SAfD65OiuCDrBKOtbSbXY4QQpjGaYMeYNGUOBpa2vkwp8TsUoQQwjROHfSpMQGMjvDjvS2FONpJZyGE6C1OHfRKKW6dEsee8qNkFdWaXY4QQpjCqYMeYG5KJL6ebry3RU7KCiH6J6cPem8PNxaMj2ZtXjnVDS1mlyOEcBJn2rv+8OHDJCYm9mI1Z9apoFdKzVZK7VVKHVBKPX6K5z2VUsvsz29TSsV3eG6sUmqLUipfKZWnlPLqvvI755ZJsbRabSzPsPT2WwshhOnOugWCUsoVeBm4HLAA25VSq7XWuzocdhdQq7UeqpS6EfgTsFAp5QYsBm7VWucqpYKAtm7/Lc5iWJgvkwcPYsm2Qu6ZPhhXF+lLKYRD+/RxKM/r3tcMT4Ir/njapx9//HFiYmK4//77AXjyySdxc3Nj48aN1NbW0tbWxjPPPMPcuXPP6W2bm5u59957ycjIwM3Njb/85S9cfPHF5Ofnc+edd9La2orNZmPlypVERkZyww03YLFYsFqt/OpXv2LhwoVd+rWhcyP6icABrXWB1roVWAqc/JvOBd6x314BXKqMDrYzgR1a61wArXW11tra5arPw6LJcVhqm0jfJ01JhBDft3DhQpYvX/7t/eXLl3P77bezatUqsrKy2LhxI4888sg5r+B7+eWXUUqRl5fHP//5T26//Xaam5t55ZVX+MlPfkJOTg4ZGRlER0ezbt06IiMjyc3NZefOncyePbtbfrfObGoWBXRsxGoBJp3uGK11u1KqHggChgNaKbUeCAGWaq2fPfkNlFL3APcAxMbGnuvv0CkzR4cT4uvJe1sLuXikNCURwqGdYeTdU1JTU6moqKC0tJTKykoCAwMJDw/npz/9Kenp6bi4uFBSUsI333xDeHh4p1938+bNPPjggwCMHDmSuLg49u3bx5QpU/jd736HxWLh2muvZdiwYSQlJfHII4/w85//nKuvvppp06Z1y+/W0ydj3YALgVvs/52vlLr05IO01q9prdO01mkhISE9UoiHmws3Tohh494KimukKYkQ4vuuv/56VqxYwbJly1i4cCFLliyhsrKSzMxMcnJyCAsLo7m5uVve6+abb2b16tUMGDCAK6+8kg0bNjB8+HCysrJISkriiSee4Omnn+6W9+pM0JcAMR3uR9sfO+Ux9nl5f6AaY/SfrrWu0lo3AmuBcV0t+nzdNDEWBbz/dZFZJQghHNjChQtZunQpK1as4Prrr6e+vp7Q0FDc3d3ZuHEjhYXnvkx72rRpLFmyBIB9+/ZRVFTEiBEjKCgoYPDgwTz00EPMnTuXHTt2UFpaire3N4sWLeLRRx/ttr3tOxP024FhSqkEpZQHcCOw+qRjVgO3229fB2zQxkTWeiBJKeVt/wKYAezCJJEBA7h0VBjLtxfT0m7KqQIhhAMbM2YMR48eJSoqioiICG655RYyMjJISkri3XffZeTIc+9ad99992Gz2UhKSmLhwoW8/fbbeHp6snz5chITE0lJSWHnzp3cdttt5OXlMXHiRFJSUnjqqad44oknuuX36tR+9EqpK4EXAVfgTa3175RSTwMZWuvV9iWT7wGpQA1wo9a6wP5nFwG/ADSwVmv92Jneqzv2oz+T9H2V3Pbm17x0YwpzU6J67H2EEOdG9qPvvHPdj75THaa01msxpl06PvbrDrebgetP82cXYyyxdAgXDg0mLsibxVsLJeiFEP2CU7YSPBMXF8WiSXH8bu1u9pQfYWS443VsF0L0DXl5edx6660nPObp6cm2bdtMqujU+l3QA1w3PprnP9vL4q2FPDMvyexyhBB2WmuMS3D6hqSkJHJycnr1Pc9nJ16n3+vmVAJ9PLh6bCSrskpokKYkQjgELy8vqqurZUvxM9BaU11djZfXue0k0y9H9AC3ToljZZaFVdkl3Do5zuxyhOj3oqOjsVgsVFbK1etn4uXlRXR09Dn9mX4b9MnR/iRG+bF4SyGLJsX2qX8uCuGM3N3dSUhIMLsMp9Qvp27A3pRkchx7vzlKRqE0JRFCOK9+G/QAc5Ij8fWSpiRCCOfWr4Pe28ON68ZH8+nOMqqkKYkQwkn166AHuGVSHG1WzbLtxWc/WAgh+qB+H/RDQwdywZAg3t9WhNUmy7qEEM6n3wc9GE1JSuqa+HxvhdmlCCFEt5OgBy4fHUaovSmJEEI4Gwl6wN3VhRsnxrJpXyVF1dKURAjhXCTo7W6aGIOLUiz5Wkb1QgjnIkFvF+E/gMtGhfKvDAvNbdKURAjhPCToO7h1cjw1x1r5dGeZ2aUIIUS3kaDv4IIhQSQE+7B4q/SUFUI4Dwn6DlxcFLdMiiWzsJZdpUfMLkcIIbqFBP1JrhsfjaebC4u3yUlZIYRzkKA/SYC3B9ckR/JhdglHm9vMLkcIIbpMgv4Ubp0SR2OrlVXZJWaXIoQQXSZBfwpjowMYG+3Pe1sKpa2ZEKLPk6A/jUWT49hf0cDXh2rMLkUIIbpEgv405oyNxM/LTfa/EUL0eRL0pzHAw5Xr02JYn19OxdFms8sRQojzJkF/BrdMiqXNqlkuTUmEEH2YBP0ZDA4ZyIVDg6UpiRCiT5OgP4tFk2MprW9mwx5pSiKE6Jsk6M/islFhhPlJUxIhRN8lQX8Wbq4u3DQxlvR9lRRWHzO7HCGEOGcS9J1w08RYXF0US7bJrpZCiL5Hgr4Twvy8mDk6jOUZxdKURAjR50jQd9Ktk+Ooa2zjkx3SlEQI0bdI0HfSlCFBDA7xke2LhRB9jgR9JymlWDQpjuyiOnaW1JtdjhBCdFqngl4pNVsptVcpdUAp9fgpnvdUSi2zP79NKRVvfzxeKdWklMqx/7zSveX3rgXjo/Fyd2GJjOqFEH3IWYNeKeUKvAxcAYwGblJKjT7psLuAWq31UOAF4E8dnjuotU6x//y4m+o2hf8Ad+YmR/FhdilHpCmJEKKP6MyIfiJwQGtdoLVuBZYCc086Zi7wjv32CuBSpZTqvjIdx6LJcTS1Wfkg02J2KUII0SmdCfoooOOuXhb7Y6c8RmvdDtQDQfbnEpRS2UqpTUqpaad6A6XUPUqpDKVURmVl5Tn9Ar0tKdqf5JgAFm8rkqYkQog+oadPxpYBsVrrVOBh4H2llN/JB2mtX9Nap2mt00JCQnq4pK5bNCmWAxUNbC2QpiRCCMfXmaAvAWI63I+2P3bKY5RSboA/UK21btFaVwNorTOBg8DwrhZttjnJkfgPcGex7H8jhOgDOhP024FhSqkEpZQHcCOw+qRjVgO3229fB2zQWmulVIj9ZC5KqcHAMKCge0o3j5e7KzekRRtNSY5IUxIhhGM7a9Db59wfANYDu4HlWut8pdTTSqlr7Ie9AQQppQ5gTNEcX4I5HdihlMrBOEn7Y621U8x33DwpjnabZqk0JRFCODjlaCcU09LSdEZGhtlldMqtb2xj/zcNbP75xbi5yrVnQgjzKKUytdZpp3pO0qkLFk2Oo/xIM/+VpiRCCAcmQd8Fl44MJcLfS07KCiEcmgR9FxxvSvLF/ioOVUlTEiGEY5Kg76IbJ8Tg5qJYIqN6IYSDkqDvolA/L2aNCedfmRZpSiKEcEgS9N1g0eQ46pvaWJNbanYpQgjxPRL03WDy4EEMDR3IYukpK4RwQBL03cBoShJLbnEdeRZpSiKEcCwS9N3k2vHRDHB3laWWQgiHI0HfTfy83JmXGslHuSXUN0pTEiGE45Cg70a3TIqjuc3GyixpSiKEcBwS9N0oMcqf1NgAFm8tpKlVlloKIRyDBH03u2faYAqqjnH5C5v4LL9culAJIUwnQd/NrkiK4J93T8bbw5V73svkzre3y/YIQghTSdD3gClDgvjkoWk8cdUoMg7XMuuFdJ5bv4fG1nazSxNC9EMS9D3E3dWFH04bzIZHZnDV2Ahe3niQy/+Szqd5ZTKdI4ToVRL0PSzUz4sXFqaw/EdT8PVy494lWdz25tccrGwwuzQhRD8hQd9LJiYM4uMHL+Q3c0aTU1TH7BfT+eOnezjWItM5Qoie5VxBX+/Y69fdXF24c2oCG352EdckR/HKpoNc9pdNfLyjVKZzhBA9xnmCviQLXhwLH90PR8rMruaMQnw9+fMNyay8dwqB3h488H42t7y+jQMVR80uTQjhhJwn6AclwOR7Ycdy+Os42Ph7aHHsefDxcYNY8+CF/HbuGHaW1DP7xS/4/drdNMh0jhCiGylHmzJIS0vTGRkZ5/8CNYfgv09D/gfgEwqX/BJSFoGrW/cV2QOqG1p4dt1elmUUE+rryS+vGsU1yZEopcwuTQjRByilMrXWaad8zumC/rji7fDZE1C8FUJGwuW/hWGXg4MHZ3ZRLb/+KJ+8knomJQzi6bmJjAj3NbssIYSD659BD6A17F4D//kN1BRAwgyY+QxEjO2e1+8hVptm6fYinlu/l6PN7dw+JZ7/uXwYfl7uZpcmhHBQ/Tfoj2tvhYw3YdMfoakOkm+CS54A/6jufZ9uVnuslWfX72Xp9iKCfDz5f1eOZH5qlEznCCG+R4L+uKY6+OLPsO0VUC4w5QGY+hPw8uuZ9+smOyx1/OqjfHKL65gQH8hT1yQyOtKxaxZC9C4J+pPVFsKG30Lev8A7GC7+BYy7w6FP2NpsmuUZxfxp3R7qm9q4bUo8P718OP4DZDpHCCFBf3olmfDZr6DwSwgeDpc/DcNnO/QJ27rGVv782T6WbCtkkI8HP589kgXjonFxcdyahRA9T4L+TLSGvWvh37+G6gMQPw1m/hYiU3uvhvOws6SeX3+0k6yiOsbFBvD03EQSo/zNLksIYRIJ+s6wtkHm2/D5H6CxGsYuhEt+BQExvV9LJ9lsmpVZFv746R5qG1u5ZVIcj8wcToC3h9mlCSF6mQT9uWiuh80vwtb/M0b7U+6DC38KXo47Wq5vauOFf+/j3S2HCfD24LFZI7ghLUamc4ToRyToz0ddMWx4BnYsBe8gmPE4pN0Jro578nNX6RF+s3on2w/XkhwTwNPXjCE5JsDssoQQvUCCvitKc4wrbA9/AUFD4bKnYORVDnvCVmvNhzkl/H7tHiqPtnBVUgQPzxzOkJCBZpcmhOhBEvRdpTXs/8xYoVO1F2IvMK6wjR5vdmWndbS5jX+kF/D65kO0tNu4IS2ahy4dRoT/ALNLE0L0gDMFfad2r1RKzVZK7VVKHVBKPX6K5z2VUsvsz29TSsWf9HysUqpBKfWz8/kFTKcUDJ8F934FV78A1fvh9UtgxV3GmnwH5OvlzsMzR7Dp0Yu5dXIcKzItzHjuc36/dje1x1rNLk8I0YvOOqJXSrkC+4DLAQuwHbhJa72rwzH3AWO11j9WSt0IzNdaL+zw/ApAA9u01s+f6f0cckR/spaj8OVL8NXfQFth0o9g2iMwINDsyk6ruKaRF/+zn1XZFnw83Lh7+mDuujABH0/HvUhMCNF5XR3RTwQOaK0LtNatwFJg7knHzAXesd9eAVyq7BuyKKXmAYeA/PMp3iF5+hp75TyYCUnXG4H/v6mw9e/GvjoOKGaQN3++IZl1/zOdKUOC+Mu/9zHjuY289eUhWtqtZpcnhOhBnQn6KKC4w32L/bFTHqO1bgfqgSCl1EDg58BTXS/VAflHwbz/gx+lQ/hYWPc4vDwR8leBzTHDc3iYL6/dlsYH913A0NCBPLVmF5c8v4mVmRasNsc6XyOE6B493WHqSeAFrfUZWz0ppe5RSmUopTIqKyt7uKQeEDEWbvsIblkBbl7wrzvgpRRIfx4aKsyu7pTGxQbyz7sn8+4PJhLo484j/8rlipfSWZ9fLv1rhXAynZmjnwI8qbWeZb//CwCt9R86HLPefswWpZQbUA6EAOnA8UtLAwAb8Gut9d9O9359Yo7+TKztsPcT2P4GHNoELu4wag5MuAvipjrkskybTfPpznL+/NleCqqOkRITwGOzR3DBkGCzSxNCdFKXllfag3sfcClQgnEy9matdX6HY+4HkjqcjL1Wa33DSa/zJNDgFCdjO6tqv7EPfs4S44rbkJGQdhckL3TIK23brTZWZFp46b/7KatvZtqwYB6bNZKkaMerVQhxoi6vo1dKXQm8CLgCb2qtf6eUehrI0FqvVkp5Ae8BqUANcKPWuuCk13iS/hb0x7U2Gj1st78BpVng7m2cxJ1wF0Qkm13d9zS3WXlvSyEvf36AusY2uehKiD5ALphyJCVZkPEG5K2E9iaISjMCf8x8cHesi5mONLfxeoeLrq4fH81PLpOLroRwRBL0jqipFnKXGqP86v3GGvyUWyDtBxA0xOzqTlDV0MLfNhzg/W1FoOD2KXHce9FQBvnILplCOAoJekemtbGPzvY3YM/HYGuHwRcbo/zhVzhU16uOF115e7hx97TB3DUtgYFy0ZUQppOg7yuOlkPWu8a++EdKwDcSxt8O424Hvwizq/vWvm+O8vz6vXy26xuCfDx44JKh3DwpFk83V7NLE6LfkqDva6ztsH+9Mco/+F9QrsaOmRPugoQZDrNEM6uolufW7WVLQTVRAQP46eXDmZ8ahavsgy9Er5Og78tqCiDjLcheDE01xlbJaT+AlJsdYm8drTWbD1Tx7Lq95JXUMyx0II/MHMGsMWEoB/lCEqI/kKB3Bm3NsOtDY5Rv+dq4AjdxgTHKjzJ/u2StjYuunl9vXHSVHBPAz2eN4IKhctGVEL1Bgt7ZlOcZgb9jObQdg4gUI/ATF4CHj6mltVttrMyy8OJ/vrvo6uHLh5Maa/6/PoRwZhL0zqr5COxYZoR+5W7w9IeUm4ypnZAR5pbWZmXx1kJe3niA2sY2JiUM4sczhnDRiBCZ0hGiB0jQOzutoWiLEfi7PgJbG8RPMwJ/5NXgZt5694aWdpZ+XcQbmw9RVt/MiDBf7pk+mDnJkXi49fSeekL0HxL0/UlDJWS/B5lvQV0ReAcbJ27H32HqhVit7TbW5JbyavpB9n3TQLifF3ddmMCNE2Pw9XLchutC9BUS9P2RzQoHNxqBv/dToxNW/DQj8EfNATdPU8rSWvP53kpeTT/I1oIafL3cWDQ5jjsviCfUz8uUmoRwBhL0/d3RcmN5Zta7UFcIAwYZo/xxt0PIcNPKyimu47X0g3y6sxx3FxeuHRfF3dMHy+ZpQpwHCXphsNng0OfGlbd7PjG2W4ibah/lXwPu5oyoD1cd4x9fFLAi00Kr1cZlo8L48YzBjI8bZEo9QvRFEvTi+xoqjH3yM9+B2kPgFfDdKD90pCklVTW08O5Xh3lnSyH1TW2kxQXyoxlDuHRkKC5yta0QZyRBL07PZjM2Vct8G3avMVbsxEw2Rvlj5pmydfKxlnaWZxTz+heHKKlrYkiID/dMH8y81CjZT0eI05CgF51zrApy3jdCv+ag0QVr7EIj9MPG9Ho57VYbn+SV8eqmAnaVHSHU15M7pyZw86RY/AfISh0hOpKgF+dGazi8GbLeMdblW1sheoJ9lD+/16++Pb6fzqubCth8oIqBnm7cPCmWO6fGSxMUIewk6MX5O1YNO5Yao/yqfeDpB2NvMObyI8b2ejk7S+p5Nb2AT3aU4uqimJsSxT3TBzM8zLfXaxHCkUjQi67TGoq2GoGfvwqsLRA5zhjlJy4Az95dEllc08gbmw+xdHsRzW02LhkZyo+mD2ZiwiDZYkH0SxL0ons11hgbqmW+beyx4zHQaHY+/g6ITOnVUmqOtfLelkLe2XKYmmOtpMQE8KPpg5k5Jlz2xRf9igS96BlaQ/HXxlz+zg+MZucRyfZR/nXg5ddrpTS1WlmRZeEf6QUU1TQSH+TN3dMHs2BcNF7uslJHOD8JetHzmuog719Gk5SKfHD3gaQF9lH+uF7rimW1adbtLOfV9IPssNQTPNCDOy6IZ9HkOAK8pZm5cF4S9KL3aA0lmcYeOzs/gLZGI+gvehyGzey1wNdas7WghlfTD/L53kq8PVyZmxLJvJQoJsQPkguwhNORoBfmOL5f/lf/a+ykGZkKM34Ow2f3at/b3WVHeP2LQ6zNK6OpzUpUwADmpUYyPzWKoaGyWkc4Bwl6YS5rG+QuhfTnjE3VIpKNwB9xZa8G/rGWdv696xtWZZfwxf5KbBoSo/yYlxLFNcmRsnum6NMk6IVjsLYZI/z05439dcKTYMbjMPKqXg18gIqjzXycW8aHOSXssNTjomDq0GDmp0Yxa0w4Pp5uvVqPEF0lQS8ci7Ud8pYbI/yaAghLghmPGd2wXHq/69SBigY+yilhVXYJltomBri7MmtMGPNSo7hwaDBurtIJSzg+CXrhmKztsHMFbHrW2FsnLNEe+HNMCXybTZNZVMuq7BI+2VFGfVMbwQM9mJNszOcnRfnLxVjCYUnQC8dmbYedK40RfvV+CB1tBP6ouaYEPkBLu5XP91byYXYJ/91dQavVxuAQH+anRDEvNYqYQd6m1CXE6UjQi77BZjWWZKY/a+yrEzIKZjwKo+eBi3kXPdU3tfFpXhmrskvYdqgGgLS4QOalRnFVUgSBPrI+X5hPgl70LTarsZ/Opmehai8EjzBG+GPmmxr4ACV1TcZ8flYJ+ysacHdVXDQilGtTo7h4ZKhchStMI0Ev+iabFXZ9aAR+5R4IHg7TH4PEa00PfK01u8qO8GF2CR/llFJxtAVfLzeuSopgXmoUE+WiLNHLJOhF32azwe6PjMCv2AVBw4wRfuIC0wMfjG0XvjpYxarsEtbvLOdYq5VIfy/mpkYxPzVKtlAWvUKCXjgHmw32rIHP/2TspxM0FKY/amyg5uoY694bW42Lsj7MLiF9fxVWm2Z0hB/XjotiTnIkYXJRlughEvTCudhssOdjY4T/TR4MGmwEftINDhP4YDQ7/zi3lFU5peQW151wUdbsxHC8PRynVtH3dTnolVKzgZcAV+B1rfUfT3reE3gXGA9UAwu11oeVUhOB144fBjyptV51pveSoBedZrPB3rWw6Y9QngeBCUbgj13oUIEPUFDZwIc5pazKtlBc04SPhytXJEWwYFw0kxJkPl90XZeCXinlCuwDLgcswHbgJq31rg7H3AeM1Vr/WCl1IzBfa71QKeUNtGqt25VSEUAuEKm1bj/d+0nQi3OmtRH4n/8RyndAYDxM+xkk3wiujtVEXGvN9sO1rMy08EleGQ0t7UQHDuDa1CiuHRdNfHDv9uMVzqOrQT8FYyQ+y37/FwBa6z90OGa9/ZgtSik3oBwI0R1eXCmVAGwFoiToRY/QGvatMwK/LAcC4mDaI5Bys8MFPhjNUj7bVc6KTAubD1ShtbE+f8H4aK4aG4Gfl+PVLBxXV4P+OmC21vqH9vu3ApO01g90OGan/RiL/f5B+zFVSqlJwJtAHHDrqaZulFL3APcAxMbGji8sLDyPX1MIO61h/2fw+R+gNBv8Y2H6I5B8M7g55sVN5fXNrMouYWWWhQMVDXi6uTBzTDgLxkUxbViItEUUZ2Vq0Hc4ZhTwDjBda918uveTEb3oNlrD/n8bc/glmeAbCVPug3G392qbw3OhtWaHpZ6VWRZW55ZS19hGqK8n81OjWDA+WpZqitNyiKkb+3EbgMe01qdNcgl60e20hoP/hc0vwuEvwNMfJvwAJv0YfMPNru60WtqtbNxTwYrMEj7fW0G7TZMU5c+CcVFckxLFINl6QXTQ1aB3w6hGOMYAAA9BSURBVDgZeylQgnEy9matdX6HY+4HkjqcjL1Wa32DfV6+2H4yNg7YgnHStur772SQoBc9qiQTvvxf2L0aXNyME7YXPATBw8yu7IyqGlpYnVPKyiwL+aVHcHdVXDIylAXjorloRCgebrKVcn/XHcsrrwRexFhe+abW+ndKqaeBDK31aqWUF/AekArUADdqrQvs0zyPA22ADXhaa/3hmd5Lgl70iuqDsOVlyFkC7S1G85OpP4GYiWZXdlZ7yo+wMtPCquxSqhpaGOTjwTXJkVw3PpoxkX6ylXI/JRdMCXE6DZXw9WvGT3MdxE4xAn/YLNO2SO6sdquN9P2VrMws4d+7vqHVamNEmC8LxkcxLyVKWiP2MxL0QpxNSwNkL4Ytf4P6YmPHzKkPQdL14OZpdnVnVd/YxpodpXyQZSGryLgKd/rwEBaMi+by0WGyq2Y/IEEvRGdZ2yD/Q/jyJWN7Bd8ImHwvjL8DvPzNrq5TDlY28EGWhVVZJZTWN+Pr5cbVYyO5bnwU42IDZWrHSUnQC3GutIaDG4zAP7QJPP0g7U6YdC/4RZhdXafYbJotBdWszLTw6c5ymtqsJAT7cG1qFPPHRREdKF2ynIkEvRBdUZptrNTZ9SEoV0heaKzUCRlhdmWd1tDSzqd5ZazMsrC1wOiSNSE+kDnJkVyRGEGIr+NPT4kzk6AXojvUHDJW6mQvhvYmGHGlceI2drLZlZ2T4ppGPsop4eMdZewpP4qLgguGBDMnOYJZY8IJ8Jb1+X2RBL0Q3elYFXz9D2OlTlMNxEwyAn/4FQ6/Uudk+745yse5pazOLeVwdSPurorpw0KYkxzJZaPDGOjpWLuAitOToBeiJ7Qeg+wlsOWvUFdkdL6a+pCxTXIfWKnTkdaa/NIjrMktZU1uKaX1zXi6uXDpqFDmjI2Ufrh9gAS9ED3J2m7M33/5krFN8sAw+0qdO2FAgNnVnTObTZNdXMua3DI+3lFGVUMLPh6uzBwTzpzkCC4cGiJX4jogCXoheoPWUPC5EfgFG8HDF9LugMn3gV+k2dWdF6tNs62gmjU7SlmbV059Uxv+A9y5IjGcOcmRTB4cJDtrOggJeiF6W1musVIn/wNjpc7YG+CCByF0lNmVnbfWdhtfHqhiTW4p6/ONJujBAz25KskI/XGxgdIpy0QS9EKYpfYwbPk/yHrXWKkzbCZEpcHAEPAJAZ/Q7257DIQ+cjFTc5uVz/dWsCa3jP/s/oaWdhuR/l5cnRzJnLGRJEbJnju9TYJeCLMdq4btr0PmW3C07NTHuA0wAv/bLwH7z8DQ798fEAgujnFytKGlnf/u/oY1uaVs2ldJm1UTH+TNnORI5iRHyh76vUSCXghH0t4KjVXQUGEs1TxWAccqT3G/0vivtn7/NZQLeAef9MUQeuJtn+DvviR6aRVQfWMb6/PLWbOjlC8PVGHTMCLMlznJEVw9NlJ64vYgCXoh+iqbzdhV89svgsrvfk71xdB27NSv4+nfIfiDjW5bgy8yfjx6ZiuEyqMtrNtZxprcMr4+bFyNOzbanzljI7lqbASRAQN65H37Kwl6IfqL1mP2L4KqDl8MFSfdr4S6YuNLwW0ADLkYRlwBw2cbXwQ9oLSuibV5ZazJLSXXUg8YWzDMToxgUsIgRob74uYqSza7QoJeCHGi9lYo/BL2fmr81BcBCqLTjNAfcSWEjOyRk8OF1cf4eIcR+nvKjwLg4+FKSmwA4+MGkRYXSGpsAL5e7t3+3s5Mgl4IcXpawzf5sHet8VOabTwemGAE/ogrjIYsrt2/HUJJXRMZh2vILKwl43Ate8qPYNPG98uIMF/S4gNJixvE+LhAogMHyEqeM5CgF0J03pFS2LfOGOkXbAJrC3gFGEtDR1wBQy8DL78eeeujzW3kFNeRWVhLZmEt2UV1NLS0AxDm58n4uMBvR/2jI/1wl+meb0nQCyHOT0uDcZXvnrVG+DfVgIs7JEwzRvvDZ0NATI+9vdWm2VN+5NsRf2ZhLSV1TQAMcHclOcb/2xH/uNhA/L3773SPBL0QoutsVij+2j7F8ylU7zceD0/6boonIqXHL/oqq286Ifh3lR3BajNybHjYwG9H/GnxgcQO8u430z0S9EKI7le1/7vQL94G2mYs2zx+MjdhWq+s3z/W0k6upY7Mw7VkFNaSVVTL0WZjuid4oCfj4wKMUX98IImR/k67IZsEvRCiZx2rgv2fGcF/YIOxdNNjIAy5xAj9YTPBJ6hXSrHZNPsqjpJxuJasQiP8i2oaAfB0cyE5OoBxcYGkxQUyPi6QQB/naLQiQS+E6D1tzXAo/bvRfkO5cSVv7JTvRvtBQ3q1pIojzcZ0j/0nv6Sedvt0z5AQHyYmDGLq0GAuGBLMoD4a/BL0Qghz2GxQlvPdev1v8ozHg4cboT9qLkSP7/Wymtus5BbXkWFf3bP9UA1H7at7xkT6MXVoMFOHBjMxfhADPBxjT6GzkaAXQjiGuiJ76K+Fw5vB1g5xF8L0nxnbMZh04rTdaiOvpJ4vD1Sx+UAVWYV1tFpteLi6MC4ugAuHBnPB0GDGRvk77BW8EvRCCMfTXA85/zQatRwthegJMP1RYz7f5JUyja3tbD9cy1f24M8vPQKAr6cbk4cEcaF9xD8kxMdhVvVI0AshHFd7C+Qsgc0vGCP+8LFG4I+82mGarVc3tLCloJovD1Tx5YHqb0/uhvl5MnVo8LfBH+bnZVqNEvRCCMdnbYMdy+GLP0PNQWOvnWk/g8RrHWbv/eOKqhv58qAx2v/qQBW1jW0ADAsd+O38/qTBg/Drxf16JOiFEH2HzQr5qyD9eajcDYOGwLSHYexCcHW8K19tNs3u8iP2+f1qvj5UTXObDVcXRXK0/7fz+6mxAXi69dwXlgS9EKLvsdlg7yew6Vko3wH+sXDh/0Dqol5rpHI+WtqtZBXW8ZV9xJ9bXIdNG1s2TEwY9O00z8hw327tsStBL4Tou7SG/f+G9GfBsh18I2DqT2Dc7T3WNKU71Te1se34/P7Bag5UNAAQ5OPBlA4ndmMGde13kaAXQvR9WsOhTcaUzuEvjFaKFzwAE34Inn2nL215fbP9pK4x4q842gJAXJA3N6TFcP/FQ8/rdSXohRDOpXALpD8HB/9rbKE8+T6YdI/RNL0P0VpzsLKBzfuN+f3REb48PHPEeb2WBL0QwjmVZEL6n425fE8/mHi3Efo+wWZX1uvOFPSOsUhVCCHOR9R4uOl9+PFmGHopfPEXeDEJ1v8SjpabXZ3D6FTQK6VmK6X2KqUOKKUeP8XznkqpZfbntyml4u2PX66UylRK5dn/e0n3li+EEBh74l//Nty/DUZdA1v/Di+OhbWPQr3F7OpMd9agV0q5Ai8DVwCjgZuUUqNPOuwuoFZrPRR4AfiT/fEqYI7WOgm4HXivuwoXQojvCRkB174KD2ZA8kLIeAteSoHVD0LNIbOrM01nRvQTgQNa6wKtdSuwFJh70jFzgXfst1cAlyqllNY6W2tdan88HxiglHLcBbBCCOcwaDBc81d4KBvS7oTcZfDX8fDBj6Byn9nV9brOtHWPAoo73LcAk053jNa6XSlVDwRhjOiPWwBkaa1bTn4DpdQ9wD0AsbGxnS5eCCHOKCAGrnwOpj0CX/0VMt6EHctgzDxje4XwxN6rxdoOrQ3Qesz+Y7/d1vjdbf9oo1lLN+tM0HeZUmoMxnTOzFM9r7V+DXgNjFU3vVGTEKIf8Q2HWb+DCx+GrS/DtteMbRZGXGlskRzVYU98raGt6cQwPl0wn/zcCbcbT7xv/d4Y9/tGzzMt6EuAjm3eo+2PneoYi1LKDfAHqgGUUtHAKuA2rfXBLlcshBDnyycILv01XPAgfP0P2PIy/OMSCIiF9tbvgpnOjjcVePic9DPQWM/vHw3uJz1+8nEePsbVvcdve/r1yK/dmaDfDgxTSiVgBPqNwM0nHbMa42TrFuA6YIPWWiulAoBPgMe11l92X9lCCNEFAwJhxmMw+V5jOqc878Twdfc+RTiffNsb3AY4zFbKZ3LWoLfPuT8ArAdcgTe11vlKqaeBDK31auAN4D2l1AGgBuPLAOABYCjwa6XUr+2PzdRaV3T3LyKEEOfM09fYN8fJyZWxQgjhBOTKWCGE6Mck6IUQwslJ0AshhJOToBdCCCcnQS+EEE5Ogl4IIZycBL0QQjg5h1tHr5SqBAq78BLBnLiZWn8mn8WJ5PP4jnwWJ3KGzyNOax1yqiccLui7SimVcbqLBvob+SxOJJ/Hd+SzOJGzfx4ydSOEEE5Ogl4IIZycMwb9a2YX4EDksziRfB7fkc/iRE79eTjdHL0QQogTOeOIXgghRAcS9EII4eScJuiVUrOVUnuVUgeUUo+bXY+ZlFIxSqmNSqldSql8pZTzd1Y4C6WUq1IqWyn1sdm1mE0pFaCUWqGU2qOU2q2UmmJ2TWZSSv3U/vdkp1Lqn0opL7Nr6m5OEfRKKVfgZeAKYDRwk1JqtLlVmaodeERrPRqYDNzfzz8PgJ8Au80uwkG8BKzTWo8EkunHn4tSKgp4CEjTWididNG78cx/qu9xiqAHJgIHtNYFWutWYCkw1+SaTKO1LtNaZ9lvH8X4ixxlblXmsTeovwp43exazKaU8gemY7T/RGvdqrWuM7cq07kBA5RSboA3UGpyPd3OWYI+CijucN9CPw62jpRS8UAqsM3cSkz1IvAYYDO7EAeQAFQCb9mnsl5XSvmYXZRZtNYlwPNAEVAG1GutPzO3qu7nLEEvTkEpNRBYCfyP1vqI2fWYQSl1NVChtc40uxYH4QaMA/6utU4FjgH99pyWUioQ41//CUAk4KOUWmRuVd3PWYK+BIjpcD/a/li/pZRyxwj5JVrrD8yux0RTgWuUUocxpvQuUUotNrckU1kAi9b6+L/wVmAEf391GXBIa12ptW4DPgAuMLmmbucsQb8dGKaUSlBKeWCcTFltck2mUUopjDnY3Vrrv5hdj5m01r/QWkdrreMx/r/YoLV2uhFbZ2mty4FipdQI+0OXArtMLMlsRcBkpZS3/e/NpTjhyWk3swvoDlrrdqXUA8B6jLPmb2qt800uy0xTgVuBPKVUjv2x/6e1XmtiTcJxPAgssQ+KCoA7Ta7HNFrrbUqpFUAWxmq1bJxwOwTZAkEIIZycs0zdCCGEOA0JeiGEcHIS9EII4eQk6IUQwslJ0AshhJOToBdCCCcnQS+EEE7u/wNDG6QwfX0sHgAAAABJRU5ErkJggg==\n"
1375
+ },
1376
+ "metadata": {
1377
+ "needs_background": "light"
1378
+ }
1379
+ }
1380
+ ],
1381
+ "source": [
1382
+ "plt.figure(figsize=(8, 5))\n",
1383
+ "pd.DataFrame(history.history).plot()\n",
1384
+ "plt.show()"
1385
+ ],
1386
+ "id": "cD_u8JR4OYFL"
1387
+ },
1388
+ {
1389
+ "cell_type": "markdown",
1390
+ "metadata": {
1391
+ "id": "OJxNheOEVGoD"
1392
+ },
1393
+ "source": [
1394
+ "# 4. Make predictions"
1395
+ ],
1396
+ "id": "OJxNheOEVGoD"
1397
+ },
1398
+ {
1399
+ "cell_type": "code",
1400
+ "execution_count": null,
1401
+ "metadata": {
1402
+ "id": "qAlM31wVVFIx",
1403
+ "colab": {
1404
+ "base_uri": "https://localhost:8080/"
1405
+ },
1406
+ "outputId": "86d60e93-348e-478b-991c-d5e86693157a"
1407
+ },
1408
+ "outputs": [
1409
+ {
1410
+ "output_type": "execute_result",
1411
+ "data": {
1412
+ "text/plain": [
1413
+ "<tf.Tensor: shape=(1800,), dtype=int64, numpy=array([ 7, 318, 0, ..., 0, 0, 0])>"
1414
+ ]
1415
+ },
1416
+ "metadata": {},
1417
+ "execution_count": 64
1418
+ }
1419
+ ],
1420
+ "source": [
1421
+ "text = vectorizer(\"you shit\")\n",
1422
+ "text"
1423
+ ],
1424
+ "id": "qAlM31wVVFIx"
1425
+ },
1426
+ {
1427
+ "cell_type": "code",
1428
+ "execution_count": null,
1429
+ "metadata": {
1430
+ "id": "5Nlk_v_Da-Pi",
1431
+ "colab": {
1432
+ "base_uri": "https://localhost:8080/"
1433
+ },
1434
+ "outputId": "ad328b76-840f-44e9-d048-23d6d5443cd9"
1435
+ },
1436
+ "outputs": [
1437
+ {
1438
+ "output_type": "execute_result",
1439
+ "data": {
1440
+ "text/plain": [
1441
+ "array([[ 7, 318, 0, ..., 0, 0, 0]])"
1442
+ ]
1443
+ },
1444
+ "metadata": {},
1445
+ "execution_count": 65
1446
+ }
1447
+ ],
1448
+ "source": [
1449
+ "np.expand_dims(text, 0)"
1450
+ ],
1451
+ "id": "5Nlk_v_Da-Pi"
1452
+ },
1453
+ {
1454
+ "cell_type": "code",
1455
+ "execution_count": null,
1456
+ "metadata": {
1457
+ "id": "ReideBKOVhAY",
1458
+ "colab": {
1459
+ "base_uri": "https://localhost:8080/"
1460
+ },
1461
+ "outputId": "5e6e9aab-332b-4de0-a590-f55d2dc6bfdf"
1462
+ },
1463
+ "outputs": [
1464
+ {
1465
+ "output_type": "execute_result",
1466
+ "data": {
1467
+ "text/plain": [
1468
+ "array([[0.9876286 , 0.15251058, 0.9701179 , 0.0023339 , 0.33286613,\n",
1469
+ " 0.00344882]], dtype=float32)"
1470
+ ]
1471
+ },
1472
+ "metadata": {},
1473
+ "execution_count": 66
1474
+ }
1475
+ ],
1476
+ "source": [
1477
+ "res = model.predict(np.expand_dims(text, 0))\n",
1478
+ "res"
1479
+ ],
1480
+ "id": "ReideBKOVhAY"
1481
+ },
1482
+ {
1483
+ "cell_type": "code",
1484
+ "execution_count": null,
1485
+ "metadata": {
1486
+ "id": "-uAI_l6XVvMC",
1487
+ "colab": {
1488
+ "base_uri": "https://localhost:8080/"
1489
+ },
1490
+ "outputId": "76562f4d-0884-4e9b-96e9-351bc933a66e"
1491
+ },
1492
+ "outputs": [
1493
+ {
1494
+ "output_type": "execute_result",
1495
+ "data": {
1496
+ "text/plain": [
1497
+ "Index(['toxic', 'severe_toxic', 'obscene', 'threat', 'insult',\n",
1498
+ " 'identity_hate'],\n",
1499
+ " dtype='object')"
1500
+ ]
1501
+ },
1502
+ "metadata": {},
1503
+ "execution_count": 67
1504
+ }
1505
+ ],
1506
+ "source": [
1507
+ "df.columns[2:]"
1508
+ ],
1509
+ "id": "-uAI_l6XVvMC"
1510
+ },
1511
+ {
1512
+ "cell_type": "code",
1513
+ "execution_count": null,
1514
+ "metadata": {
1515
+ "id": "ROi-r6MGVT1T"
1516
+ },
1517
+ "outputs": [],
1518
+ "source": [
1519
+ "batch_X, batch_y = test.as_numpy_iterator().next()"
1520
+ ],
1521
+ "id": "ROi-r6MGVT1T"
1522
+ },
1523
+ {
1524
+ "cell_type": "code",
1525
+ "execution_count": null,
1526
+ "metadata": {
1527
+ "id": "vcTgLwQjYehR",
1528
+ "colab": {
1529
+ "base_uri": "https://localhost:8080/"
1530
+ },
1531
+ "outputId": "8b25d0d8-bbe4-49ac-e67f-e8a929524bae"
1532
+ },
1533
+ "outputs": [
1534
+ {
1535
+ "output_type": "execute_result",
1536
+ "data": {
1537
+ "text/plain": [
1538
+ "array([[0, 0, 0, 0, 0, 0],\n",
1539
+ " [0, 0, 0, 0, 0, 0],\n",
1540
+ " [0, 0, 0, 0, 0, 0],\n",
1541
+ " [0, 0, 0, 0, 0, 0],\n",
1542
+ " [0, 0, 0, 0, 0, 0],\n",
1543
+ " [1, 0, 1, 0, 1, 0],\n",
1544
+ " [0, 0, 0, 0, 0, 0],\n",
1545
+ " [0, 0, 0, 0, 0, 0],\n",
1546
+ " [0, 0, 0, 0, 0, 0],\n",
1547
+ " [0, 0, 0, 0, 0, 0],\n",
1548
+ " [0, 0, 0, 0, 0, 0],\n",
1549
+ " [0, 0, 0, 0, 0, 0],\n",
1550
+ " [0, 0, 0, 0, 0, 0],\n",
1551
+ " [0, 0, 0, 0, 0, 0],\n",
1552
+ " [0, 0, 0, 0, 0, 0],\n",
1553
+ " [0, 0, 0, 0, 0, 0]])"
1554
+ ]
1555
+ },
1556
+ "metadata": {},
1557
+ "execution_count": 69
1558
+ }
1559
+ ],
1560
+ "source": [
1561
+ "pred = (model.predict(batch_X) > 0.5).astype(int)\n",
1562
+ "pred"
1563
+ ],
1564
+ "id": "vcTgLwQjYehR"
1565
+ },
1566
+ {
1567
+ "cell_type": "code",
1568
+ "execution_count": null,
1569
+ "metadata": {
1570
+ "id": "kVWGgNWxc1LY",
1571
+ "colab": {
1572
+ "base_uri": "https://localhost:8080/"
1573
+ },
1574
+ "outputId": "8be7ac60-e9d2-4007-9c06-358f1a58ab89"
1575
+ },
1576
+ "outputs": [
1577
+ {
1578
+ "output_type": "execute_result",
1579
+ "data": {
1580
+ "text/plain": [
1581
+ "array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
1582
+ " 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
1583
+ " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
1584
+ " 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,\n",
1585
+ " 0, 0, 0, 0, 0, 0, 0, 0])"
1586
+ ]
1587
+ },
1588
+ "metadata": {},
1589
+ "execution_count": 70
1590
+ }
1591
+ ],
1592
+ "source": [
1593
+ "pred = pred.flatten()\n",
1594
+ "pred"
1595
+ ],
1596
+ "id": "kVWGgNWxc1LY"
1597
+ },
1598
+ {
1599
+ "cell_type": "markdown",
1600
+ "metadata": {
1601
+ "id": "INW-U2pcaXHV"
1602
+ },
1603
+ "source": [
1604
+ "# 5. Evaluate model"
1605
+ ],
1606
+ "id": "INW-U2pcaXHV"
1607
+ },
1608
+ {
1609
+ "cell_type": "code",
1610
+ "execution_count": null,
1611
+ "metadata": {
1612
+ "id": "6UfuO4WBaWre"
1613
+ },
1614
+ "outputs": [],
1615
+ "source": [
1616
+ "from tensorflow.keras.metrics import Precision, Recall, CategoricalAccuracy"
1617
+ ],
1618
+ "id": "6UfuO4WBaWre"
1619
+ },
1620
+ {
1621
+ "cell_type": "code",
1622
+ "execution_count": null,
1623
+ "metadata": {
1624
+ "id": "zJ-1rJDuaJCp"
1625
+ },
1626
+ "outputs": [],
1627
+ "source": [
1628
+ "pre = Precision()\n",
1629
+ "re = Recall()\n",
1630
+ "acc = CategoricalAccuracy()"
1631
+ ],
1632
+ "id": "zJ-1rJDuaJCp"
1633
+ },
1634
+ {
1635
+ "cell_type": "code",
1636
+ "execution_count": null,
1637
+ "metadata": {
1638
+ "id": "sQFmLI5JbQJZ"
1639
+ },
1640
+ "outputs": [],
1641
+ "source": [
1642
+ "for batch in test.as_numpy_iterator():\n",
1643
+ " X_true, y_true = batch\n",
1644
+ " pred = model.predict(X_true)\n",
1645
+ "\n",
1646
+ " y_true = y_true.flatten()\n",
1647
+ " pred = pred.flatten()\n",
1648
+ "\n",
1649
+ " pre.update_state(y_true, pred)\n",
1650
+ " re.update_state(y_true, pred)\n",
1651
+ " acc.update_state(y_true, pred)"
1652
+ ],
1653
+ "id": "sQFmLI5JbQJZ"
1654
+ },
1655
+ {
1656
+ "cell_type": "code",
1657
+ "execution_count": null,
1658
+ "metadata": {
1659
+ "id": "TRs7GXOddNAw",
1660
+ "colab": {
1661
+ "base_uri": "https://localhost:8080/"
1662
+ },
1663
+ "outputId": "95910681-d680-4272-94bd-c6a94b4bfcc0"
1664
+ },
1665
+ "outputs": [
1666
+ {
1667
+ "output_type": "stream",
1668
+ "name": "stdout",
1669
+ "text": [
1670
+ "Precision: 0.9102380275726318, Recall: 0.9139072895050049, Accuracy: 0.49949848651885986\n"
1671
+ ]
1672
+ }
1673
+ ],
1674
+ "source": [
1675
+ "print(f\"Precision: {pre.result().numpy()}, Recall: {re.result().numpy()}, Accuracy: {acc.result().numpy()}\")"
1676
+ ],
1677
+ "id": "TRs7GXOddNAw"
1678
+ },
1679
+ {
1680
+ "cell_type": "code",
1681
+ "execution_count": null,
1682
+ "metadata": {
1683
+ "id": "1oEUJDL5eymH"
1684
+ },
1685
+ "outputs": [],
1686
+ "source": [
1687
+ "model.save('toxic-detect.h5')"
1688
+ ],
1689
+ "id": "1oEUJDL5eymH"
1690
+ },
1691
+ {
1692
+ "cell_type": "markdown",
1693
+ "metadata": {
1694
+ "id": "jFglatzteIXT"
1695
+ },
1696
+ "source": [
1697
+ "# 5. Test and Gradio"
1698
+ ],
1699
+ "id": "jFglatzteIXT"
1700
+ },
1701
+ {
1702
+ "cell_type": "code",
1703
+ "execution_count": null,
1704
+ "metadata": {
1705
+ "id": "Tg_jFNCOdC3V"
1706
+ },
1707
+ "outputs": [],
1708
+ "source": [
1709
+ "!pip install gradio jinja2"
1710
+ ],
1711
+ "id": "Tg_jFNCOdC3V"
1712
+ },
1713
+ {
1714
+ "cell_type": "code",
1715
+ "execution_count": null,
1716
+ "metadata": {
1717
+ "id": "dKH2Er6Eenim"
1718
+ },
1719
+ "outputs": [],
1720
+ "source": [
1721
+ "import gradio as gr"
1722
+ ],
1723
+ "id": "dKH2Er6Eenim"
1724
+ },
1725
+ {
1726
+ "cell_type": "code",
1727
+ "execution_count": null,
1728
+ "metadata": {
1729
+ "id": "JES3zWnRfHKt"
1730
+ },
1731
+ "outputs": [],
1732
+ "source": [
1733
+ "model = tf.keras.models.load_model('toxic-detect.h5')"
1734
+ ],
1735
+ "id": "JES3zWnRfHKt"
1736
+ },
1737
+ {
1738
+ "cell_type": "code",
1739
+ "execution_count": null,
1740
+ "metadata": {
1741
+ "id": "q_zuX1vVfYHq"
1742
+ },
1743
+ "outputs": [],
1744
+ "source": [
1745
+ "def evaluate_comment(Comment):\n",
1746
+ " processed_Comment = vectorizer([Comment])\n",
1747
+ " res = model.predict(processed_Comment)\n",
1748
+ "\n",
1749
+ " text = ''\n",
1750
+ " for i, col in enumerate(df.columns[2:]):\n",
1751
+ " text += '{}: {}\\n'.format(col, 'Violate' if res[0][i] > 0.5 else 'None')\n",
1752
+ " \n",
1753
+ " return text"
1754
+ ],
1755
+ "id": "q_zuX1vVfYHq"
1756
+ },
1757
+ {
1758
+ "cell_type": "code",
1759
+ "execution_count": null,
1760
+ "metadata": {
1761
+ "id": "TpJeqs__gsCh"
1762
+ },
1763
+ "outputs": [],
1764
+ "source": [
1765
+ "interface = gr.Interface(fn = evaluate_comment, \n",
1766
+ " inputs = gr.inputs.Textbox(lines = 4, placeholder='Comment to evaluate'), \n",
1767
+ " outputs = 'text')"
1768
+ ],
1769
+ "id": "TpJeqs__gsCh"
1770
+ },
1771
+ {
1772
+ "cell_type": "code",
1773
+ "execution_count": null,
1774
+ "metadata": {
1775
+ "id": "a3DOdPazhGuW"
1776
+ },
1777
+ "outputs": [],
1778
+ "source": [
1779
+ "interface.launch(share=True)"
1780
+ ],
1781
+ "id": "a3DOdPazhGuW"
1782
+ }
1783
+ ],
1784
+ "metadata": {
1785
+ "accelerator": "GPU",
1786
+ "colab": {
1787
+ "collapsed_sections": [],
1788
+ "provenance": []
1789
+ },
1790
+ "kernelspec": {
1791
+ "display_name": "Python 3 (ipykernel)",
1792
+ "language": "python",
1793
+ "name": "python3"
1794
+ },
1795
+ "language_info": {
1796
+ "codemirror_mode": {
1797
+ "name": "ipython",
1798
+ "version": 3
1799
+ },
1800
+ "file_extension": ".py",
1801
+ "mimetype": "text/x-python",
1802
+ "name": "python",
1803
+ "nbconvert_exporter": "python",
1804
+ "pygments_lexer": "ipython3",
1805
+ "version": "3.10.6"
1806
+ }
1807
+ },
1808
+ "nbformat": 4,
1809
+ "nbformat_minor": 5
1810
+ }