AndreMitri commited on
Commit
613102e
β€’
1 Parent(s): 93d7c12

Adicionando Imagens, notebboks explicativos e os dados

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ data/imdb_reviews.csv filter=lfs diff=lfs merge=lfs -text
data/imdb_reviews.csv ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6f1314f123ac922d7d0f2bd5bd17f1734e167d90b2256c34963228bc63f6a4cb
3
+ size 66262310
imagens/BERT_TDIDF.png ADDED
imagens/Simbolico_WordCloud_Wordnet.png ADDED
notebooks_explicativos/Estatistico.ipynb ADDED
@@ -0,0 +1,765 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {
6
+ "id": "lawNHLqffR_m"
7
+ },
8
+ "source": [
9
+ "# SCC0633/SCC5908 - Processamento de Linguagem Natural\n",
10
+ "> **Docente:** Thiago Alexandre Salgueiro Pardo \\\n",
11
+ "> **EstagiΓ‘rio PAE:** Germano Antonio Zani Jorge\n",
12
+ "\n",
13
+ "\n",
14
+ "# Integrantes do Grupo: GPTrouxas\n",
15
+ "> AndrΓ© Guarnier De Mitri - 11395579 \\\n",
16
+ "> Daniel Carvalho - 10685702 \\\n",
17
+ "> Fernando - 11795342 \\\n",
18
+ "> Lucas Henrique Sant'Anna - 10748521 \\\n",
19
+ "> Magaly L Fujimoto - 4890582"
20
+ ]
21
+ },
22
+ {
23
+ "cell_type": "markdown",
24
+ "metadata": {
25
+ "id": "pV6WGoBln8id"
26
+ },
27
+ "source": [
28
+ "# New Section"
29
+ ]
30
+ },
31
+ {
32
+ "cell_type": "markdown",
33
+ "metadata": {},
34
+ "source": [
35
+ "# Abordagem EstatΓ­stico\n",
36
+ "A arquitetura da solução estatística/neural envolve duas abordagens que\n",
37
+ "serΓ£o descritas neste documento. A primeira abordagem envolve utilizar\n",
38
+ "TF-IDF e Naive Bayes. E a segunda abordagem irΓ‘ utilizar Word2Vec e um\n",
39
+ "modelo transformers prΓ©-treinado da famΓ­lia BERT, realizando finetuning do\n",
40
+ "modelo.\n",
41
+ "\n",
42
+ "Na primeira abordagem, utilizaremos o TF-IDF, que leva em consideração a\n",
43
+ "frequΓͺncia de ocorrΓͺncia dos termos em um corpus e gera uma sequΓͺncia de\n",
44
+ "vetores que serão fornecidos ao Naive Bayes para classificação da review como\n",
45
+ "positiva ou negativa.\n",
46
+ "\n",
47
+ "\n",
48
+ "Na segunda abordagem, utilizaremos o Word2Vec para vetorizar as reviews.\n",
49
+ "ApΓ³s dividir em treino e teste, faremos o fine tuning de um modelo do tipo BERT\n",
50
+ "para o nosso problema e dataset especΓ­fico. Com o BERT adaptado, faremos a\n",
51
+ "classificação de nossos textos, medindo o seu desempenho com F1 score e\n",
52
+ "acurΓ‘cia.\n",
53
+ "\n",
54
+ "![alt text](../imagens/BERT_TDIDF.png)"
55
+ ]
56
+ },
57
+ {
58
+ "cell_type": "markdown",
59
+ "metadata": {
60
+ "id": "vfP54aryxZBg"
61
+ },
62
+ "source": [
63
+ "\n",
64
+ "## # Etapas da Abordagem EstatΓ­stica\n",
65
+ "\n",
66
+ "1. **Bibliotecas**: Importamos as bibliotecas necessÑrias, considerando pandas para manipulação de dados, train_test_split para dividir o conjunto de dados em conjuntos de treinamento e teste, TfidfVectorizer para vetorização de texto usando TF-IDF, MultinomialNB para implementar o classificador Naive Bayes Multinomial e algumas métricas de avaliação.\n",
67
+ "\n",
68
+ "2. **Conjunto de dados**: Carregar o conjunto de dados e armazenΓ‘-lo em um dataframe usando pandas.\n",
69
+ "\n",
70
+ "3. **Dividir o conjunto de dados**: Usamos `train_test_split` para dividir o DataFrame em conjuntos de treinamento e teste.\n",
71
+ "\n",
72
+ "4. **TF-IDF**: Usamos `TfidfVectorizer` para converter as revisΓ΅es de texto em vetores numΓ©ricos usando a tΓ©cnica TF-IDF. Em seguida, ajustamos e transformamos tanto o conjunto de treinamento quanto o conjunto de teste.\n",
73
+ "\n",
74
+ "5. **Naive Bayes**: Treinamos um classificador Naive Bayes Multinomial e usamos o modelo treinado para prever os sentimentos no conjunto de teste usando `predict`.\n",
75
+ "\n",
76
+ "6. **Avaliação e Resultados**: Salvamos os resultados em um novo dataframe `results_df` contendo as revisáes do conjunto de teste, os sentimentos originais e os sentimentos previstos pelo modelo. Além disso, avaliamos o modelo verificando algumas métricas e a matriz de confusão.\n",
77
+ "\n"
78
+ ]
79
+ },
80
+ {
81
+ "cell_type": "markdown",
82
+ "metadata": {
83
+ "id": "TbLraa4UhWDJ"
84
+ },
85
+ "source": [
86
+ "\n",
87
+ "## # Baixando, Carregando os dados e PrΓ© Processamento\n",
88
+ "\n",
89
+ "1. Transformar todos os textos em lowercase \\\\\n",
90
+ "2. Remoção de caracteres especiais \\\\\n",
91
+ "3. Remoção de stop words \\\\\n",
92
+ "4. Lematização (Lemmatization) \\\\\n",
93
+ "5. Tokenização \\\\"
94
+ ]
95
+ },
96
+ {
97
+ "cell_type": "code",
98
+ "execution_count": 1,
99
+ "metadata": {
100
+ "id": "bIWmIe0qfTbE"
101
+ },
102
+ "outputs": [],
103
+ "source": [
104
+ "import pandas as pd"
105
+ ]
106
+ },
107
+ {
108
+ "cell_type": "code",
109
+ "execution_count": 2,
110
+ "metadata": {
111
+ "colab": {
112
+ "base_uri": "https://localhost:8080/",
113
+ "height": 206
114
+ },
115
+ "id": "Wf0n2yPdAn4C",
116
+ "outputId": "37eb3c4d-40c1-41a0-9b1a-d93ed6e272f3"
117
+ },
118
+ "outputs": [
119
+ {
120
+ "data": {
121
+ "text/html": [
122
+ "<div>\n",
123
+ "<style scoped>\n",
124
+ " .dataframe tbody tr th:only-of-type {\n",
125
+ " vertical-align: middle;\n",
126
+ " }\n",
127
+ "\n",
128
+ " .dataframe tbody tr th {\n",
129
+ " vertical-align: top;\n",
130
+ " }\n",
131
+ "\n",
132
+ " .dataframe thead th {\n",
133
+ " text-align: right;\n",
134
+ " }\n",
135
+ "</style>\n",
136
+ "<table border=\"1\" class=\"dataframe\">\n",
137
+ " <thead>\n",
138
+ " <tr style=\"text-align: right;\">\n",
139
+ " <th></th>\n",
140
+ " <th>review</th>\n",
141
+ " <th>sentiment</th>\n",
142
+ " </tr>\n",
143
+ " </thead>\n",
144
+ " <tbody>\n",
145
+ " <tr>\n",
146
+ " <th>0</th>\n",
147
+ " <td>One of the other reviewers has mentioned that ...</td>\n",
148
+ " <td>positive</td>\n",
149
+ " </tr>\n",
150
+ " <tr>\n",
151
+ " <th>1</th>\n",
152
+ " <td>A wonderful little production. &lt;br /&gt;&lt;br /&gt;The...</td>\n",
153
+ " <td>positive</td>\n",
154
+ " </tr>\n",
155
+ " <tr>\n",
156
+ " <th>2</th>\n",
157
+ " <td>I thought this was a wonderful way to spend ti...</td>\n",
158
+ " <td>positive</td>\n",
159
+ " </tr>\n",
160
+ " <tr>\n",
161
+ " <th>3</th>\n",
162
+ " <td>Basically there's a family where a little boy ...</td>\n",
163
+ " <td>negative</td>\n",
164
+ " </tr>\n",
165
+ " <tr>\n",
166
+ " <th>4</th>\n",
167
+ " <td>Petter Mattei's \"Love in the Time of Money\" is...</td>\n",
168
+ " <td>positive</td>\n",
169
+ " </tr>\n",
170
+ " </tbody>\n",
171
+ "</table>\n",
172
+ "</div>"
173
+ ],
174
+ "text/plain": [
175
+ " review sentiment\n",
176
+ "0 One of the other reviewers has mentioned that ... positive\n",
177
+ "1 A wonderful little production. <br /><br />The... positive\n",
178
+ "2 I thought this was a wonderful way to spend ti... positive\n",
179
+ "3 Basically there's a family where a little boy ... negative\n",
180
+ "4 Petter Mattei's \"Love in the Time of Money\" is... positive"
181
+ ]
182
+ },
183
+ "execution_count": 2,
184
+ "metadata": {},
185
+ "output_type": "execute_result"
186
+ }
187
+ ],
188
+ "source": [
189
+ "db = pd.read_csv('../data/imdb_reviews.csv')\n",
190
+ "db.head(5)"
191
+ ]
192
+ },
193
+ {
194
+ "cell_type": "code",
195
+ "execution_count": 3,
196
+ "metadata": {
197
+ "colab": {
198
+ "base_uri": "https://localhost:8080/"
199
+ },
200
+ "id": "6PlfPScGMF1_",
201
+ "outputId": "2a0bd4a1-e22a-429d-82a4-5984eeab7b9d"
202
+ },
203
+ "outputs": [
204
+ {
205
+ "data": {
206
+ "text/plain": [
207
+ "sentiment\n",
208
+ "positive 25000\n",
209
+ "negative 25000\n",
210
+ "Name: count, dtype: int64"
211
+ ]
212
+ },
213
+ "execution_count": 3,
214
+ "metadata": {},
215
+ "output_type": "execute_result"
216
+ }
217
+ ],
218
+ "source": [
219
+ "db['sentiment'].value_counts()"
220
+ ]
221
+ },
222
+ {
223
+ "cell_type": "code",
224
+ "execution_count": 4,
225
+ "metadata": {
226
+ "colab": {
227
+ "base_uri": "https://localhost:8080/"
228
+ },
229
+ "id": "Kev0EaSmMa4N",
230
+ "outputId": "eab73a61-ba36-4d72-e4f2-82236f9f2880"
231
+ },
232
+ "outputs": [
233
+ {
234
+ "name": "stdout",
235
+ "output_type": "stream",
236
+ "text": [
237
+ "Quantidade de valores faltantes para cada variΓ‘vel do dataset:\n",
238
+ "review 0\n",
239
+ "sentiment 0\n",
240
+ "dtype: int64\n"
241
+ ]
242
+ }
243
+ ],
244
+ "source": [
245
+ "valores_ausentes = db.isnull().sum(axis=0)\n",
246
+ "print('Quantidade de valores faltantes para cada variΓ‘vel do dataset:')\n",
247
+ "print(valores_ausentes)"
248
+ ]
249
+ },
250
+ {
251
+ "cell_type": "code",
252
+ "execution_count": 5,
253
+ "metadata": {
254
+ "colab": {
255
+ "base_uri": "https://localhost:8080/",
256
+ "height": 276
257
+ },
258
+ "id": "1AI3rN0KMuUq",
259
+ "outputId": "7ea5c91b-362e-49eb-82a7-6e8535f0e591"
260
+ },
261
+ "outputs": [
262
+ {
263
+ "name": "stderr",
264
+ "output_type": "stream",
265
+ "text": [
266
+ "[nltk_data] Downloading package stopwords to\n",
267
+ "[nltk_data] C:\\Users\\andre\\AppData\\Roaming\\nltk_data...\n",
268
+ "[nltk_data] Package stopwords is already up-to-date!\n",
269
+ "[nltk_data] Downloading package wordnet to\n",
270
+ "[nltk_data] C:\\Users\\andre\\AppData\\Roaming\\nltk_data...\n",
271
+ "[nltk_data] Package wordnet is already up-to-date!\n"
272
+ ]
273
+ },
274
+ {
275
+ "data": {
276
+ "text/html": [
277
+ "<div>\n",
278
+ "<style scoped>\n",
279
+ " .dataframe tbody tr th:only-of-type {\n",
280
+ " vertical-align: middle;\n",
281
+ " }\n",
282
+ "\n",
283
+ " .dataframe tbody tr th {\n",
284
+ " vertical-align: top;\n",
285
+ " }\n",
286
+ "\n",
287
+ " .dataframe thead th {\n",
288
+ " text-align: right;\n",
289
+ " }\n",
290
+ "</style>\n",
291
+ "<table border=\"1\" class=\"dataframe\">\n",
292
+ " <thead>\n",
293
+ " <tr style=\"text-align: right;\">\n",
294
+ " <th></th>\n",
295
+ " <th>review</th>\n",
296
+ " <th>sentiment</th>\n",
297
+ " </tr>\n",
298
+ " </thead>\n",
299
+ " <tbody>\n",
300
+ " <tr>\n",
301
+ " <th>0</th>\n",
302
+ " <td>one reviewer mentioned watching 1 oz episode h...</td>\n",
303
+ " <td>positive</td>\n",
304
+ " </tr>\n",
305
+ " <tr>\n",
306
+ " <th>1</th>\n",
307
+ " <td>wonderful little production filming technique ...</td>\n",
308
+ " <td>positive</td>\n",
309
+ " </tr>\n",
310
+ " <tr>\n",
311
+ " <th>2</th>\n",
312
+ " <td>thought wonderful way spend time hot summer we...</td>\n",
313
+ " <td>positive</td>\n",
314
+ " </tr>\n",
315
+ " <tr>\n",
316
+ " <th>3</th>\n",
317
+ " <td>basically family little boy jake think zombie ...</td>\n",
318
+ " <td>negative</td>\n",
319
+ " </tr>\n",
320
+ " <tr>\n",
321
+ " <th>4</th>\n",
322
+ " <td>petter mattei love time money visually stunnin...</td>\n",
323
+ " <td>positive</td>\n",
324
+ " </tr>\n",
325
+ " </tbody>\n",
326
+ "</table>\n",
327
+ "</div>"
328
+ ],
329
+ "text/plain": [
330
+ " review sentiment\n",
331
+ "0 one reviewer mentioned watching 1 oz episode h... positive\n",
332
+ "1 wonderful little production filming technique ... positive\n",
333
+ "2 thought wonderful way spend time hot summer we... positive\n",
334
+ "3 basically family little boy jake think zombie ... negative\n",
335
+ "4 petter mattei love time money visually stunnin... positive"
336
+ ]
337
+ },
338
+ "execution_count": 5,
339
+ "metadata": {},
340
+ "output_type": "execute_result"
341
+ }
342
+ ],
343
+ "source": [
344
+ "import re\n",
345
+ "import nltk\n",
346
+ "from nltk.corpus import stopwords\n",
347
+ "from nltk.stem import PorterStemmer\n",
348
+ "from nltk.stem import WordNetLemmatizer\n",
349
+ "\n",
350
+ "def lowercase_text(text):\n",
351
+ " return text.lower()\n",
352
+ "\n",
353
+ "def remove_html(text):\n",
354
+ " return re.sub(r'<[^<]+?>', '', text)\n",
355
+ "\n",
356
+ "def remove_url(text):\n",
357
+ " return re.sub(r'http[s]?://\\S+|www\\.\\S+', '', text)\n",
358
+ "\n",
359
+ "def remove_punctuations(text):\n",
360
+ " tokens_list = '!\"#$%&\\'()*+,-./:;<=>?@[\\\\]^_`{|}~'\n",
361
+ " for char in text:\n",
362
+ " if char in tokens_list:\n",
363
+ " text = text.replace(char, ' ')\n",
364
+ "\n",
365
+ " return text\n",
366
+ "\n",
367
+ "def remove_emojis(text):\n",
368
+ " emojis = re.compile(\"[\"\n",
369
+ " u\"\\U0001F600-\\U0001F64F\"\n",
370
+ " u\"\\U0001F300-\\U0001F5FF\"\n",
371
+ " u\"\\U0001F680-\\U0001F6FF\"\n",
372
+ " u\"\\U0001F1E0-\\U0001F1FF\"\n",
373
+ " u\"\\U00002500-\\U00002BEF\"\n",
374
+ " u\"\\U00002702-\\U000027B0\"\n",
375
+ " u\"\\U00002702-\\U000027B0\"\n",
376
+ " u\"\\U000024C2-\\U0001F251\"\n",
377
+ " u\"\\U0001f926-\\U0001f937\"\n",
378
+ " u\"\\U00010000-\\U0010ffff\"\n",
379
+ " u\"\\u2640-\\u2642\"\n",
380
+ " u\"\\u2600-\\u2B55\"\n",
381
+ " u\"\\u200d\"\n",
382
+ " u\"\\u23cf\"\n",
383
+ " u\"\\u23e9\"\n",
384
+ " u\"\\u231a\"\n",
385
+ " u\"\\ufe0f\"\n",
386
+ " u\"\\u3030\"\n",
387
+ " \"]+\", re.UNICODE)\n",
388
+ "\n",
389
+ " text = re.sub(emojis, '', text)\n",
390
+ " return text\n",
391
+ "\n",
392
+ "def remove_stop_words(text):\n",
393
+ " stop_words = stopwords.words('english')\n",
394
+ " new_text = ''\n",
395
+ " for word in text.split():\n",
396
+ " if word not in stop_words:\n",
397
+ " new_text += ''.join(f'{word} ')\n",
398
+ "\n",
399
+ " return new_text.strip()\n",
400
+ "\n",
401
+ "def lem_words(text):\n",
402
+ " lemma = WordNetLemmatizer()\n",
403
+ " new_text = ''\n",
404
+ " for word in text.split():\n",
405
+ " new_text += ''.join(f'{lemma.lemmatize(word)} ')\n",
406
+ "\n",
407
+ " return new_text\n",
408
+ "\n",
409
+ "def preprocess_text(text):\n",
410
+ " text = lowercase_text(text)\n",
411
+ " text = remove_html(text)\n",
412
+ " text = remove_url(text)\n",
413
+ " text = remove_punctuations(text)\n",
414
+ " text = remove_emojis(text)\n",
415
+ " text = remove_stop_words(text)\n",
416
+ " text = lem_words(text)\n",
417
+ "\n",
418
+ " return text\n",
419
+ "\n",
420
+ "nltk.download('stopwords')\n",
421
+ "nltk.download('wordnet')\n",
422
+ "db['review'] = db['review'].apply(preprocess_text)\n",
423
+ "db.head()"
424
+ ]
425
+ },
426
+ {
427
+ "cell_type": "markdown",
428
+ "metadata": {
429
+ "id": "QgufZpgHnPa4"
430
+ },
431
+ "source": [
432
+ "# **Conjunto de Treino e teste**"
433
+ ]
434
+ },
435
+ {
436
+ "cell_type": "code",
437
+ "execution_count": 6,
438
+ "metadata": {
439
+ "id": "s0lJ6Q0tnPka"
440
+ },
441
+ "outputs": [],
442
+ "source": [
443
+ "from sklearn.model_selection import train_test_split\n",
444
+ "\n",
445
+ "X= db['review']\n",
446
+ "y= db['sentiment']\n",
447
+ "\n",
448
+ "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= 0.2, random_state= 12)"
449
+ ]
450
+ },
451
+ {
452
+ "cell_type": "code",
453
+ "execution_count": 7,
454
+ "metadata": {
455
+ "colab": {
456
+ "base_uri": "https://localhost:8080/"
457
+ },
458
+ "id": "nz4erCEJuD4-",
459
+ "outputId": "88d57536-66e7-4d9b-e016-bf40183d4c45"
460
+ },
461
+ "outputs": [
462
+ {
463
+ "data": {
464
+ "text/plain": [
465
+ "35235 disagree people saying lousy horror film good ...\n",
466
+ "36936 husband wife doctor team carole nile nelson mo...\n",
467
+ "46486 like cast pretty much however story sort unfol...\n",
468
+ "27160 movie awful bad bear expend anything word avoi...\n",
469
+ "19490 purchased blood castle dvd ebay buck knowing s...\n",
470
+ " ... \n",
471
+ "36482 strange thing see film scene work rather weakl...\n",
472
+ "40177 saw cheap dvd release title entity force since...\n",
473
+ "19709 one peculiar oft used romance movie plot one s...\n",
474
+ "38555 nothing positive say meandering nonsense huffi...\n",
475
+ "14155 low moment life bewildered depressed sitting r...\n",
476
+ "Name: review, Length: 40000, dtype: object"
477
+ ]
478
+ },
479
+ "execution_count": 7,
480
+ "metadata": {},
481
+ "output_type": "execute_result"
482
+ }
483
+ ],
484
+ "source": [
485
+ "X_train"
486
+ ]
487
+ },
488
+ {
489
+ "cell_type": "markdown",
490
+ "metadata": {
491
+ "id": "6LX-6e-QlioJ"
492
+ },
493
+ "source": [
494
+ "# **TD-IDF e Naive Bayes**"
495
+ ]
496
+ },
497
+ {
498
+ "cell_type": "code",
499
+ "execution_count": 8,
500
+ "metadata": {
501
+ "id": "gscB9-obNusA"
502
+ },
503
+ "outputs": [],
504
+ "source": [
505
+ "from sklearn.metrics import confusion_matrix,classification_report\n",
506
+ "from sklearn.feature_extraction.text import TfidfVectorizer\n",
507
+ "from sklearn.preprocessing import StandardScaler as encoder\n",
508
+ "from sklearn.metrics import (\n",
509
+ " accuracy_score,\n",
510
+ " confusion_matrix,\n",
511
+ " ConfusionMatrixDisplay,\n",
512
+ " f1_score,\n",
513
+ ")\n",
514
+ "\n",
515
+ "\n",
516
+ "tfidf = TfidfVectorizer()\n",
517
+ "tfidf_train = tfidf.fit_transform(X_train)\n",
518
+ "tfidf_test = tfidf.transform(X_test)\n",
519
+ "\n",
520
+ "from sklearn.naive_bayes import MultinomialNB\n",
521
+ "\n",
522
+ "naive_bayes = MultinomialNB()\n",
523
+ "\n",
524
+ "naive_bayes.fit(tfidf_train, y_train)\n",
525
+ "y_pred = naive_bayes.predict(tfidf_test)\n",
526
+ "\n",
527
+ "\n"
528
+ ]
529
+ },
530
+ {
531
+ "cell_type": "code",
532
+ "execution_count": 9,
533
+ "metadata": {
534
+ "colab": {
535
+ "base_uri": "https://localhost:8080/",
536
+ "height": 206
537
+ },
538
+ "id": "RfJ7AHMZvAb8",
539
+ "outputId": "685701e1-b1e8-47fb-9dc5-1bc04dd3894b"
540
+ },
541
+ "outputs": [
542
+ {
543
+ "data": {
544
+ "text/html": [
545
+ "<div>\n",
546
+ "<style scoped>\n",
547
+ " .dataframe tbody tr th:only-of-type {\n",
548
+ " vertical-align: middle;\n",
549
+ " }\n",
550
+ "\n",
551
+ " .dataframe tbody tr th {\n",
552
+ " vertical-align: top;\n",
553
+ " }\n",
554
+ "\n",
555
+ " .dataframe thead th {\n",
556
+ " text-align: right;\n",
557
+ " }\n",
558
+ "</style>\n",
559
+ "<table border=\"1\" class=\"dataframe\">\n",
560
+ " <thead>\n",
561
+ " <tr style=\"text-align: right;\">\n",
562
+ " <th></th>\n",
563
+ " <th>review</th>\n",
564
+ " <th>original sentiment</th>\n",
565
+ " <th>predicted sentiment</th>\n",
566
+ " </tr>\n",
567
+ " </thead>\n",
568
+ " <tbody>\n",
569
+ " <tr>\n",
570
+ " <th>34622</th>\n",
571
+ " <td>hard tell noonan marshall trying ape abbott co...</td>\n",
572
+ " <td>negative</td>\n",
573
+ " <td>negative</td>\n",
574
+ " </tr>\n",
575
+ " <tr>\n",
576
+ " <th>1163</th>\n",
577
+ " <td>well start one reviewer said know real treat s...</td>\n",
578
+ " <td>positive</td>\n",
579
+ " <td>positive</td>\n",
580
+ " </tr>\n",
581
+ " <tr>\n",
582
+ " <th>7637</th>\n",
583
+ " <td>wife kid opinion absolute abc classic seen eve...</td>\n",
584
+ " <td>positive</td>\n",
585
+ " <td>positive</td>\n",
586
+ " </tr>\n",
587
+ " <tr>\n",
588
+ " <th>7045</th>\n",
589
+ " <td>surprise basic copycat comedy classic nutty pr...</td>\n",
590
+ " <td>positive</td>\n",
591
+ " <td>positive</td>\n",
592
+ " </tr>\n",
593
+ " <tr>\n",
594
+ " <th>43847</th>\n",
595
+ " <td>josef von sternberg directs magnificent silent...</td>\n",
596
+ " <td>positive</td>\n",
597
+ " <td>positive</td>\n",
598
+ " </tr>\n",
599
+ " </tbody>\n",
600
+ "</table>\n",
601
+ "</div>"
602
+ ],
603
+ "text/plain": [
604
+ " review original sentiment \\\n",
605
+ "34622 hard tell noonan marshall trying ape abbott co... negative \n",
606
+ "1163 well start one reviewer said know real treat s... positive \n",
607
+ "7637 wife kid opinion absolute abc classic seen eve... positive \n",
608
+ "7045 surprise basic copycat comedy classic nutty pr... positive \n",
609
+ "43847 josef von sternberg directs magnificent silent... positive \n",
610
+ "\n",
611
+ " predicted sentiment \n",
612
+ "34622 negative \n",
613
+ "1163 positive \n",
614
+ "7637 positive \n",
615
+ "7045 positive \n",
616
+ "43847 positive "
617
+ ]
618
+ },
619
+ "execution_count": 9,
620
+ "metadata": {},
621
+ "output_type": "execute_result"
622
+ }
623
+ ],
624
+ "source": [
625
+ "# Criando DataFrame com resultados\n",
626
+ "results_df = pd.DataFrame({'review': X_test, 'original sentiment': y_test, 'predicted sentiment': y_pred})\n",
627
+ "results_df.head()"
628
+ ]
629
+ },
630
+ {
631
+ "cell_type": "markdown",
632
+ "metadata": {
633
+ "id": "8Xq2ABXYtsjk"
634
+ },
635
+ "source": [
636
+ "## Avaliação"
637
+ ]
638
+ },
639
+ {
640
+ "cell_type": "code",
641
+ "execution_count": 10,
642
+ "metadata": {
643
+ "id": "3lXqDNhSrhsZ"
644
+ },
645
+ "outputs": [],
646
+ "source": [
647
+ "from sklearn.metrics import confusion_matrix, classification_report\n",
648
+ "import seaborn as sns\n",
649
+ "import matplotlib.pyplot as plt\n",
650
+ "\n",
651
+ "def plot_confusion_matrix(y_true, y_pred, labels, model_name):\n",
652
+ " cm = confusion_matrix(y_true, y_pred, labels=labels)\n",
653
+ " plt.figure(figsize=(8, 6))\n",
654
+ " sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=labels, yticklabels=labels)\n",
655
+ " plt.xlabel('Predicted Labels')\n",
656
+ " plt.ylabel('True Labels')\n",
657
+ " plt.title(f'Confusion Matrix {model_name}')\n",
658
+ " plt.show()\n",
659
+ "\n",
660
+ "# Função para calcular e imprimir as métricas de avaliação\n",
661
+ "def print_evaluation_metrics(y_true, y_pred, model_name):\n",
662
+ " print(f\"Classification Report {model_name}:\")\n",
663
+ " print(classification_report(y_true, y_pred))\n"
664
+ ]
665
+ },
666
+ {
667
+ "cell_type": "code",
668
+ "execution_count": 11,
669
+ "metadata": {
670
+ "colab": {
671
+ "base_uri": "https://localhost:8080/",
672
+ "height": 564
673
+ },
674
+ "id": "ybfb_GKDuqmb",
675
+ "outputId": "3e4c3a98-8962-4ce8-9856-2252f769a1b8"
676
+ },
677
+ "outputs": [
678
+ {
679
+ "data": {
680
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAApIAAAIhCAYAAAD91lq9AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjguMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy81sbWrAAAACXBIWXMAAA9hAAAPYQGoP6dpAABdSklEQVR4nO3deVxUdd//8ffIrsIIKCCGu3JpaporWLmvuV11pamRpmnmFrl126Ztkt6VVpaZmZpR2lVZakZZLmmKO7lEZqWpCa6IgogI5/eHP+duBBVOjDMyr2eP87id7/mecz5nHrdXnz7fZSyGYRgCAAAAiqiUswMAAADAzYlEEgAAAKaQSAIAAMAUEkkAAACYQiIJAAAAU0gkAQAAYAqJJAAAAEwhkQQAAIApJJIAAAAwhUQSuEF27typhx56SNWqVZOvr6/Kli2r22+/XdOmTdOpU6cc+uwdO3aoVatWslqtslgsmjFjRrE/w2KxaPLkycV+3+uZP3++LBaLLBaL1qxZk++8YRiqWbOmLBaLWrdubeoZb7/9tubPn1+ka9asWXPVmMy4/J6+vr76888/851v3bq16tWrZ9dWtWpV23dz+dqaNWtqzJgxOnHiRLHEBcC9eTo7AMAdzJkzR8OHD1dkZKTGjx+vunXrKicnR1u3btU777yjjRs3asmSJQ57/qBBg5SZmalFixYpMDBQVatWLfZnbNy4Ubfcckux37ew/P39NXfu3HzJ4tq1a/X777/L39/f9L3ffvttlS9fXgMHDiz0Nbfffrs2btyounXrmn5uQbKzs/X0009r4cKFherfsmVLvfLKK5KkrKwsbd26VZMnT9YPP/ygrVu3FmtsANwPiSTgYBs3btSjjz6qDh066IsvvpCPj4/tXIcOHTR27FglJCQ4NIbdu3dryJAh6tKli8Oe0aJFC4fduzD69Omj+Ph4vfXWWwoICLC1z507V1FRUTpz5swNiSMnJ0cWi0UBAQEO+U46d+6sjz76SOPGjdNtt9123f7lypWzi6NNmzY6e/asXnjhBf3666+qXbt2sccIwH0wtA042JQpU2SxWPTuu+/aJZGXeXt7q0ePHrbPeXl5mjZtmv71r3/Jx8dHISEhevDBB3X48GG76y4PZW7ZskV33nmnSpcurerVq+vll19WXl6epP8bDr148aJmzZplG+KUpMmTJ9v+/HeXrzlw4ICtbdWqVWrdurWCg4Pl5+enypUr695779W5c+dsfQoa2t69e7d69uypwMBA+fr6qmHDhlqwYIFdn8tDwB9//LGeeuophYeHKyAgQO3bt9fevXsL9yVL6tu3ryTp448/trWlp6frs88+06BBgwq85rnnnlPz5s0VFBSkgIAA3X777Zo7d64Mw7D1qVq1qvbs2aO1a9favr/LFd3LsS9cuFBjx45VpUqV5OPjo99++y3f0PaJEycUERGh6Oho5eTk2O7/888/q0yZMoqJiSnUe06YMEHBwcF64oknCv3dXMlqtUqSvLy8TN8DACQSScChcnNztWrVKjVu3FgRERGFuubRRx/VE088oQ4dOmjp0qV64YUXlJCQoOjo6Hzz2lJTU9W/f3898MADWrp0qbp06aKJEyfqww8/lCTdfffd2rhxoyTpP//5jzZu3Gj7XFgHDhzQ3XffLW9vb73//vtKSEjQyy+/rDJlyujChQtXvW7v3r2Kjo7Wnj179MYbb+jzzz9X3bp1NXDgQE2bNi1f/yeffFJ//vmn3nvvPb377rvat2+funfvrtzc3ELFGRAQoP/85z96//33bW0ff/yxSpUqpT59+lz13R555BF98skn+vzzz3XPPfdo1KhReuGFF2x9lixZourVq6tRo0a27+/KaQgTJ07UwYMH9c4772jZsmUKCQnJ96zy5ctr0aJF2rJliy0JPHfunO677z5VrlxZ77zzTqHe09/fX08//bS++eYbrVq16rr9DcPQxYsXdfHiRWVkZGj16tWaMWOGWrZsqWrVqhXqmQBwVQYAh0lNTTUkGffff3+h+icnJxuSjOHDh9u1b9q0yZBkPPnkk7a2Vq1aGZKMTZs22fWtW7eu0alTJ7s2ScaIESPs2iZNmmQU9D8B8+bNMyQZ+/fvNwzDMD799FNDkpGUlHTN2CUZkyZNsn2+//77DR8fH+PgwYN2/bp06WKULl3aOH36tGEYhrF69WpDktG1a1e7fp988okhydi4ceM1n3s53i1bttjutXv3bsMwDKNp06bGwIEDDcMwjFtvvdVo1arVVe+Tm5tr5OTkGM8//7wRHBxs5OXl2c5d7drLz7vrrruuem716tV27VOnTjUkGUuWLDEGDBhg+Pn5GTt37rzmO175ntnZ2Ub16tWNJk2a2OJs1aqVceutt9pdU6VKFUNSvqNZs2ZGSkrKdZ8JANdDRRJwIatXr5akfIs6mjVrpjp16uj777+3aw8LC1OzZs3s2ho0aFDgql6zGjZsKG9vbw0dOlQLFizQH3/8UajrVq1apXbt2uWrxA4cOFDnzp3LVxn9+/C+dOk9JBXpXVq1aqUaNWro/fff165du7Rly5arDmtfjrF9+/ayWq3y8PCQl5eXnn32WZ08eVLHjh0r9HPvvffeQvcdP3687r77bvXt21cLFizQm2++qfr16xf6eunSdIgXX3xRW7du1SeffHLNvnfccYe2bNmiLVu26Mcff9TcuXN1/PhxtW3blpXbAP4xEknAgcqXL6/SpUtr//79hep/8uRJSVLFihXznQsPD7edvyw4ODhfPx8fH2VlZZmItmA1atTQd999p5CQEI0YMUI1atRQjRo19Prrr1/zupMnT171PS6f/7sr3+XyfNKivIvFYtFDDz2kDz/8UO+8845q166tO++8s8C+mzdvVseOHSVdWlX/448/asuWLXrqqaeK/NyC3vNaMQ4cOFDnz59XWFhYoedGXun+++/X7bffrqeeespuzuWVrFarmjRpoiZNmig6OlqDBg3SRx99pOTkZL366qumng0Al5FIAg7k4eGhdu3aadu2bfkWyxTkcjKVkpKS79yRI0dUvnz5YovN19dX0qXtZP6uoCrVnXfeqWXLlik9PV2JiYmKiopSbGysFi1adNX7BwcHX/U9JBXru/zdwIEDdeLECb3zzjt66KGHrtpv0aJF8vLy0vLly9W7d29FR0erSZMmpp5Z0KKlq0lJSdGIESPUsGFDnTx5UuPGjTP9zKlTp+r333/Xu+++W6RrL1d7f/rpJ1PPBoDLSCQBB5s4caIMw9CQIUMKXJySk5OjZcuWSZLatm0rSbbFMpdt2bJFycnJateuXbHFdXnl8c6dO+3aL8dSEA8PDzVv3lxvvfWWJGn79u1X7duuXTutWrXKljhe9sEHH6h06dIO2y6oUqVKGj9+vLp3764BAwZctZ/FYpGnp6c8PDxsbVlZWQXuz1hcVd7c3Fz17dtXFotFX3/9teLi4vTmm2/q888/N3W/9u3bq0OHDnr++eeVkZFR6OuSkpIkqcBFQQBQFOwjCThYVFSUZs2apeHDh6tx48Z69NFHdeuttyonJ0c7duzQu+++q3r16ql79+6KjIzU0KFD9eabb6pUqVLq0qWLDhw4oGeeeUYRERF6/PHHiy2url27KigoSIMHD9bzzz8vT09PzZ8/X4cOHbLr984772jVqlW6++67VblyZZ0/f962Mrp9+/ZXvf+kSZO0fPlytWnTRs8++6yCgoIUHx+vr776StOmTbNtQeMIL7/88nX73H333XrttdfUr18/DR06VCdPntQrr7xS4BZN9evX16JFi7R48WJVr15dvr6+RZ7XKF36TtatW6dvv/1WYWFhGjt2rNauXavBgwerUaNGplZRT506VY0bN9axY8d066235jt/+vRpJSYmSrr0Hy3JycmaMmWKfHx8NGLEiCI/DwD+jkQSuAGGDBmiZs2aafr06Zo6dapSU1Pl5eWl2rVrq1+/fho5cqSt76xZs1SjRg3NnTtXb731lqxWqzp37qy4uLgC50SaFRAQoISEBMXGxuqBBx5QuXLl9PDDD6tLly56+OGHbf0aNmyob7/9VpMmTVJqaqrKli2revXqaenSpbY5hgWJjIzUhg0b9OSTT2rEiBHKyspSnTp1NG/evCL9QoyjtG3bVu+//76mTp2q7t27q1KlShoyZIhCQkI0ePBgu77PPfecUlJSNGTIEJ09e1ZVqlSx22ezMFauXKm4uDg988wzdpXl+fPnq1GjRurTp4/Wr18vb2/vIt23UaNG6tu3rz766KMCz//444+KioqSdKmiXKlSJTVr1kxPPfWUGjZsWKRnAcCVLIbxt513AQAAgEJijiQAAABMIZEEAACAKSSSAAAAMIVEEgAAAKaQSAIAAMAUEkkAAACYQiIJAAAAU0rkhuR+zcc7OwQADnJi3TRnhwDAQcp4F/5364ubX6OR1+9kUtaOmQ67t7NRkQQAAIApJbIiCQAAUCQWamtmkEgCAABYnDesfjMj/QYAAIApVCQBAAAY2jaFbw0AAACmUJEEAABgjqQpVCQBAABgChVJAAAA5kiawrcGAAAAU6hIAgAAMEfSFBJJAAAAhrZN4VsDAACAKVQkAQAAGNo2hYokAAAATKEiCQAAwBxJU/jWAAAAYAoVSQAAAOZImkJFEgAAAKZQkQQAAGCOpCkkkgAAAAxtm0L6DQAAAFOoSAIAADC0bQrfGgAAAEyhIgkAAEBF0hS+NQAAAJhCRRIAAKAUq7bNoCIJAAAAU6hIAgAAMEfSFBJJAAAANiQ3hfQbAAAAplCRBAAAYGjbFL41AAAAmEJFEgAAgDmSplCRBAAAgClUJAEAAJgjaQrfGgAAAEyhIgkAAMAcSVNIJAEAABjaNoVvDQAAAKaQSAIAAFgsjjv+gbi4OFksFsXGxtraDMPQ5MmTFR4eLj8/P7Vu3Vp79uyxuy47O1ujRo1S+fLlVaZMGfXo0UOHDx+265OWlqaYmBhZrVZZrVbFxMTo9OnTRYqPRBIAAMAFbdmyRe+++64aNGhg1z5t2jS99tprmjlzprZs2aKwsDB16NBBZ8+etfWJjY3VkiVLtGjRIq1fv14ZGRnq1q2bcnNzbX369eunpKQkJSQkKCEhQUlJSYqJiSlSjCSSAAAAllKOO0zIyMhQ//79NWfOHAUGBtraDcPQjBkz9NRTT+mee+5RvXr1tGDBAp07d04fffSRJCk9PV1z587Vq6++qvbt26tRo0b68MMPtWvXLn333XeSpOTkZCUkJOi9995TVFSUoqKiNGfOHC1fvlx79+4tdJwkkgAAAA6UnZ2tM2fO2B3Z2dnXvGbEiBG6++671b59e7v2/fv3KzU1VR07drS1+fj4qFWrVtqwYYMkadu2bcrJybHrEx4ernr16tn6bNy4UVarVc2bN7f1adGihaxWq61PYZBIAgAAOHCOZFxcnG0e4uUjLi7uqqEsWrRI27dvL7BPamqqJCk0NNSuPTQ01HYuNTVV3t7edpXMgvqEhITku39ISIitT2Gw/Q8AAIADTZw4UWPGjLFr8/HxKbDvoUOH9Nhjj+nbb7+Vr6/vVe9puWIRj2EY+dqudGWfgvoX5j5/R0USAADAgXMkfXx8FBAQYHdcLZHctm2bjh07psaNG8vT01Oenp5au3at3njjDXl6etoqkVdWDY8dO2Y7FxYWpgsXLigtLe2afY4ePZrv+cePH89X7bwWEkkAAAAXWWzTrl077dq1S0lJSbajSZMm6t+/v5KSklS9enWFhYVp5cqVtmsuXLigtWvXKjo6WpLUuHFjeXl52fVJSUnR7t27bX2ioqKUnp6uzZs32/ps2rRJ6enptj6FwdA2AACAi/D391e9evXs2sqUKaPg4GBbe2xsrKZMmaJatWqpVq1amjJlikqXLq1+/fpJkqxWqwYPHqyxY8cqODhYQUFBGjdunOrXr29bvFOnTh117txZQ4YM0ezZsyVJQ4cOVbdu3RQZGVnoeEkkAQAAbqLf2p4wYYKysrI0fPhwpaWlqXnz5vr222/l7+9v6zN9+nR5enqqd+/eysrKUrt27TR//nx5eHjY+sTHx2v06NG21d09evTQzJkzixSLxTAMo3hey3X4NR/v7BAAOMiJddOcHQIABynj7bxkzq/HLIfdO2vpow67t7NRkQQAADC5cbi741sDAACAKVQkAQAAbqI5kq6EiiQAAABMoSIJAADAHElTSCQBAAAY2jaF9BsAAACmUJEEAABuz0JF0hQqkgAAADCFiiQAAHB7VCTNoSIJAAAAU6hIAgAAUJA0hYokAAAATKEiCQAA3B5zJM0hkQQAAG6PRNIchrYBAABgChVJAADg9qhImkNFEgAAAKZQkQQAAG6PiqQ5VCQBAABgChVJAAAACpKmUJEEAACAKVQkAQCA22OOpDlUJAEAAGAKFUkAAOD2qEiaQyIJAADcHomkOQxtAwAAwBQqkgAAwO1RkTSHiiQAAABMoSIJAABAQdIUKpIAAAAwhYokAABwe8yRNIeKJAAAAEyhIgkAANweFUlzSCQBAIDbI5E0h6FtAAAAmEJFEgAAgIKkKVQkAQAAYAoVSQAA4PaYI2kOFUkAAACY4jKJ5Lp16/TAAw8oKipKf/31lyRp4cKFWr9+vZMjAwAAJZ3FYnHYUZK5RCL52WefqVOnTvLz89OOHTuUnZ0tSTp79qymTJni5OgAAABQEJdIJF988UW98847mjNnjry8vGzt0dHR2r59uxMjAwAA7oCKpDkusdhm7969uuuuu/K1BwQE6PTp0zc+IAAA4FZKesLnKC5RkaxYsaJ+++23fO3r169X9erVnRARAAAArsclEslHHnlEjz32mDZt2iSLxaIjR44oPj5e48aN0/Dhw50dHgAAKOksDjxKMJcY2p4wYYLS09PVpk0bnT9/XnfddZd8fHw0btw4jRw50tnhAQAAoAAukUhK0ksvvaSnnnpKP//8s/Ly8lS3bl2VLVvW2WEBAAA3wBxJc1xiaHvBggXKzMxU6dKl1aRJEzVr1owkEgAAwMW5RCI5btw4hYSE6P7779fy5ct18eJFZ4cEAADcCNv/mOMSiWRKSooWL14sDw8P3X///apYsaKGDx+uDRs2ODs0AAAAXIVLJJKenp7q1q2b4uPjdezYMc2YMUN//vmn2rRpoxo1ajg7PAAAUMK5SkVy1qxZatCggQICAhQQEKCoqCh9/fXXtvMDBw7Md/8WLVrY3SM7O1ujRo1S+fLlVaZMGfXo0UOHDx+265OWlqaYmBhZrVZZrVbFxMSY2rvbJRLJvytdurQ6deqkLl26qFatWjpw4ICzQwIAACWdi2z/c8stt+jll1/W1q1btXXrVrVt21Y9e/bUnj17bH06d+6slJQU27FixQq7e8TGxmrJkiVatGiR1q9fr4yMDHXr1k25ubm2Pv369VNSUpISEhKUkJCgpKQkxcTEFC1YudCq7XPnzmnJkiWKj4/Xd999p4iICPXt21f//e9/nR0aAADADdG9e3e7zy+99JJmzZqlxMRE3XrrrZIkHx8fhYWFFXh9enq65s6dq4ULF6p9+/aSpA8//FARERH67rvv1KlTJyUnJyshIUGJiYlq3ry5JGnOnDmKiorS3r17FRkZWeh4XSKR7Nu3r5YtW6bSpUvrvvvu05o1axQdHe3ssAAAgJtw5KKY7OxsZWdn27X5+PjIx8fnmtfl5ubqv//9rzIzMxUVFWVrX7NmjUJCQlSuXDm1atVKL730kkJCQiRJ27ZtU05Ojjp27GjrHx4ernr16mnDhg3q1KmTNm7cKKvVaksiJalFixayWq3asGFDkRJJlxjatlgsWrx4sY4cOaK33nqLJBIAAJQYcXFxtrmIl4+4uLir9t+1a5fKli0rHx8fDRs2TEuWLFHdunUlSV26dFF8fLxWrVqlV199VVu2bFHbtm1tiWpqaqq8vb0VGBhod8/Q0FClpqba+lxOPP8uJCTE1qewXKIi+dFHHzk7BAAA4MYcWZGcOHGixowZY9d2rWpkZGSkkpKSdPr0aX322WcaMGCA1q5dq7p166pPnz62fvXq1VOTJk1UpUoVffXVV7rnnnuuek/DMOzesaD3vbJPYTgtkXzjjTc0dOhQ+fr66o033rhm39GjR9+gqAAAAIpXYYax/87b21s1a9aUJDVp0kRbtmzR66+/rtmzZ+frW7FiRVWpUkX79u2TJIWFhenChQtKS0uzq0oeO3bMNuIbFhamo0eP5rvX8ePHFRoaWqR3c1oiOX36dPXv31++vr6aPn36VftZLBYSSTcybkAbvTC8q2YuWqfx05fK06OUJg/rrE7R/1K1SsE6k5GlVVt+0zNvrVDKiTN21zavV0WTH+2sprdWVs7FXO389Yh6Pv6ezmdfVOWKgZo4qL1aN6mp0CB/pZw4o48TtmvqvO+VczH3KtEAKG53d2qrlCNH8rXf16efxj0xUW+/+bp+XLdWh/86rLJly6p5i2iNjh2jCiH/9y+3z/67WAkrluuX5J+VmZmptT9uln9AwI18DZRArrxxuGEY+eZYXnby5EkdOnRIFStWlCQ1btxYXl5eWrlypXr37i3p0n7du3fv1rRp0yRJUVFRSk9P1+bNm9WsWTNJ0qZNm5Senl7k6YVOSyT3799f4J/hvhrXuUWDe7XQzn3/9y+Z0r7eahhZSS+//5127ktRYICf/vfxHvrvKwN1x8D/q2Q3r1dFX74+WK8sWK0xr3yhCxdz1aBWReXlGZKkyCohKlXKopEvf6bfD53QrTXC9NaT/1EZP29NfGP5DX9XwF19+PGnys37v/94+33fPj06dJA6dOqk8+fP65fkn/XwI8NVOzJSZ86c0SvT4hQ7arjiF39mu+b8+fOKbnmnolveqTdff80ZrwE4zJNPPqkuXbooIiJCZ8+e1aJFi7RmzRolJCQoIyNDkydP1r333quKFSvqwIEDevLJJ1W+fHn9+9//liRZrVYNHjxYY8eOVXBwsIKCgjRu3DjVr1/ftoq7Tp066ty5s4YMGWKrcg4dOlTdunUr0kIbyUXmSD7//PMaN26cSpcubdeelZWl//3f/9Wzzz7rpMhwo5Tx89a85/tp+JRP9T8PtbO1n8k8r26j59j1HfPKF1o//zFFhJbToaOnJUnTHu+utz/5Ua98sNrW7/dDJ2x/Xpm4VysT99o+HzhySrXj12rIPVEkksANFBgUZPd53tw5uiWisho3aSaLxaJZc963O//ExKcV0/c+paQcUcWK4ZKk/jEDJElbt2y6MUHDLbhKRfLo0aOKiYlRSkqKrFarGjRooISEBHXo0EFZWVnatWuXPvjgA50+fVoVK1ZUmzZttHjxYvn7+9vuMX36dHl6eqp3797KyspSu3btNH/+fHl4eNj6xMfHa/To0bbV3T169NDMmTOLHK9LJJLPPfechg0bli+RPHfunJ577jkSSTcwY/y/lfBjslZv2WeXSBYkoKyf8vLydDojS5JUIbCMmtWrokUJO7R6zghVuyVYvx44psnvJGjDTweufp8yvjp15lxxvgaAIsjJuaCvly9V/wcHXvVf4hlnz8piscjfn6FrOJhr5JGaO3fuVc/5+fnpm2++ue49fH199eabb+rNN9+8ap+goCB9+OGHpmL8O5fY/udqq4R++uknBV3xX69Xys7O1pkzZ+wOI++io0KFA9zX4TY1jKykZ97++rp9fbw99cKILlr8TZLOZl6aL1KtUrAk6akhHfT+l5vU87H3lLT3L62Y+YhqRJQv8D7VKgXr0d4t9d7nicX3IgCKZPX33+vs2bPq0fPfBZ7Pzs7WGzNeVeeu3VS2bNkbHB2AwnBqRTIwMND2O5G1a9e2SyZzc3OVkZGhYcOGXfMecXFxeu655+zaPMKj5HVLS4fEjOJ1S4hV/zump7qPnqPsC9f+DwBPj1Ja+GJ/lbJY9Nj/fm5rL/X///9m7pJELVy+VZL0069H1LpJLQ3o3lTPXpGgViwfoKWvD9bn3+/U/KWbi/mNABTWF0s+VfQdd9otpLksJydHE8ePkWEYmvj0JCdEB3fjKkPbNxunJpIzZsyQYRgaNGiQnnvuOVmtVts5b29vVa1a1W4n94IUtDdTSDv+R+dm0ehftyg0yF8b5j9ma/P09NAdjapp2H+iZb1zovLyDHl6lFL8lBhVCQ9Sl+GzbdVISbbV28n7j9nde++Bo4oILWfXVrF8gBLefkSbdh3UiLjPBMA5jhz5S5sTN+qV6fmH3nJycvQ/4x7XX38d1uy586lGAi7MqYnkgAGXJkxXq1ZN0dHR8vLyKvI9CtqbyVLKJaZ+ohBWb/1Njfu+Ytf27jN9tPfPY3r1g9V2SWSNiPLqPPydfPMa/0xJ05Fj6apdpYJde83KFfTtxl9sn8MrBCjh7WHa8cthDX1hsQzDcNyLAbimpV98rqCgYN1xVyu79stJ5MGDf+rduQtUrlzgVe4AFC8qkuY4LeM6c+aMAv7/vl+NGjVSVlaWsrKyCuwbwP5gJVbGuWz9/If9pqiZWRd0Kv2cfv7jqDw8Sumjlx9Uo8hKumfs+/IoVUqhQZdWpp06c862B+T0+DV6ekhH7dp3RD/9ekQP3N1EkVVC1G/iQkmXKpHfzBqmQ6mnNfGN5apQ7v8qHEdPnb1BbwtAkvLy8rT0iyXq1qOXPD3/719DFy9e1IQxj+mX5J/1+lvvKDcvVydOHJd0aUsTLy9vSdKJE8d18sQJHTp4UJK0b9+vKlOmjMIqVpTVWu6Gvw/gzpyWSAYGBiolJcX2o+PX+qme3Fw2jHZXlUKs6n7XrZKkzR/aT2Ho+Ogsrdv+hyRp5qL18vX20rTYHgoMKK1d+46o2+h3tf+vk5Kkds1rq2ZEBdWMqKDflz9jdx+/5uNvwJsAuGxT4galphxRz3/b/5zbsaOpWrtmlSTp/v/0sjv37vsL1KRpc0nSp58s0ruz3rKde3jgA5KkyS9MUY9eV/+JOOBaKEiaYzGcNL63du1atWzZUp6enlq7du01+7Zq1eqa569EYgCUXCfWTXN2CAAcpIy387K5muOuv3OIWb+90sVh93Y2p1Uk/54cFjVRBAAAKE7MkTTHJfaRTEhI0Pr1622f33rrLTVs2FD9+vVTWlqaEyMDAADuwGJx3FGSuUQiOX78eJ05c2kLl127dmnMmDHq2rWr/vjjj3xb+wAAAMA1uMQ+Ofv371fdunUlSZ999pm6d++uKVOmaPv27eratauTowMAACUdQ9vmuERF0tvbW+fOXdob8LvvvrP9gHhQUJCtUgkAAADX4hIVyTvuuENjxoxRy5YttXnzZi1evFiS9Ouvv+qWW25xcnQAAKCkoyBpjktUJGfOnClPT099+umnmjVrlipVqiRJ+vrrr9W5c2cnRwcAAICCuERFsnLlylq+fHm+9unTpzshGgAA4G5KlaIkaYZLJJKSlJubqy+++ELJycmyWCyqU6eOevbsKQ8PD2eHBgAAgAK4RCL522+/qWvXrvrrr78UGRkpwzD066+/KiIiQl999ZVq1Kjh7BABAEAJxhxJc1xijuTo0aNVo0YNHTp0SNu3b9eOHTt08OBBVatWTaNHj3Z2eAAAoISzWCwOO0oyl6hIrl27VomJiQoKCrK1BQcH6+WXX1bLli2dGBkAAACuxiUSSR8fH509ezZfe0ZGhry9vZ0QEQAAcCclvHDoMC4xtN2tWzcNHTpUmzZtkmEYMgxDiYmJGjZsmHr06OHs8AAAAFAAl0gk33jjDdWoUUNRUVHy9fWVr6+voqOjVbNmTb3++uvODg8AAJRwzJE0xyWGtsuVK6cvv/xSv/32m37++WdJUt26dVWzZk0nRwYAAICrcYlEUpLmzp2r6dOna9++fZKkWrVqKTY2Vg8//LCTIwMAACVdSa8cOopLJJLPPPOMpk+frlGjRikqKkqStHHjRj3++OM6cOCAXnzxRSdHCAAAgCu5RCI5a9YszZkzR3379rW19ejRQw0aNNCoUaNIJAEAgENRkDTHJRLJ3NxcNWnSJF9748aNdfHiRSdEBAAA3AlD2+a4xKrtBx54QLNmzcrX/u6776p///5OiAgAAADX4xIVSenSYptvv/1WLVq0kCQlJibq0KFDevDBBzVmzBhbv9dee81ZIQIAgBKKgqQ5LpFI7t69W7fffrsk6ffff5ckVahQQRUqVNDu3btt/Sg7AwAAuA6XSCRXr17t7BAAAIAbo1hljkvMkQQAAMDNxyUqkgAAAM5EQdIcKpIAAAAwhYokAABwe8yRNIeKJAAAAEyhIgkAANweBUlzSCQBAIDbY2jbHIa2AQAAYAoVSQAA4PYoSJpDRRIAAACmUJEEAABujzmS5lCRBAAAgClUJAEAgNujIGkOFUkAAACYQkUSAAC4PeZImkMiCQAA3B55pDkMbQMAAMAUKpIAAMDtMbRtDhVJAAAAmEJFEgAAuD0qkuZQkQQAAIApJJIAAMDtWSyOO4pi1qxZatCggQICAhQQEKCoqCh9/fXXtvOGYWjy5MkKDw+Xn5+fWrdurT179tjdIzs7W6NGjVL58uVVpkwZ9ejRQ4cPH7brk5aWppiYGFmtVlmtVsXExOj06dNF/t5IJAEAAFzELbfcopdffllbt27V1q1b1bZtW/Xs2dOWLE6bNk2vvfaaZs6cqS1btigsLEwdOnTQ2bNnbfeIjY3VkiVLtGjRIq1fv14ZGRnq1q2bcnNzbX369eunpKQkJSQkKCEhQUlJSYqJiSlyvBbDMIx//tquxa/5eGeHAMBBTqyb5uwQADhIGW/nzVNsPWODw+79zaONlZ2dbdfm4+MjHx+fQl0fFBSk//3f/9WgQYMUHh6u2NhYPfHEE5IuVR9DQ0M1depUPfLII0pPT1eFChW0cOFC9enTR5J05MgRRUREaMWKFerUqZOSk5NVt25dJSYmqnnz5pKkxMRERUVF6ZdfflFkZGSh342KJAAAcHuOHNqOi4uzDSFfPuLi4q4bU25urhYtWqTMzExFRUVp//79Sk1NVceOHW19fHx81KpVK23YcCkR3rZtm3Jycuz6hIeHq169erY+GzdulNVqtSWRktSiRQtZrVZbn8Ji1TYAAIADTZw4UWPGjLFru1Y1cteuXYqKitL58+dVtmxZLVmyRHXr1rUleaGhoXb9Q0ND9eeff0qSUlNT5e3trcDAwHx9UlNTbX1CQkLyPTckJMTWp7BIJAEAgNtz5PY/RRnGlqTIyEglJSXp9OnT+uyzzzRgwACtXbvWdv7KWA3DuG78V/YpqH9h7nMlhrYBAABciLe3t2rWrKkmTZooLi5Ot912m15//XWFhYVJUr6q4bFjx2xVyrCwMF24cEFpaWnX7HP06NF8zz1+/Hi+auf1kEgCAAC35yrb/xTEMAxlZ2erWrVqCgsL08qVK23nLly4oLVr1yo6OlqS1LhxY3l5edn1SUlJ0e7du219oqKilJ6ers2bN9v6bNq0Senp6bY+hcXQNgAAgIt48skn1aVLF0VEROjs2bNatGiR1qxZo4SEBFksFsXGxmrKlCmqVauWatWqpSlTpqh06dLq16+fJMlqtWrw4MEaO3asgoODFRQUpHHjxql+/fpq3769JKlOnTrq3LmzhgwZotmzZ0uShg4dqm7duhVpxbZEIgkAAKBSLvITiUePHlVMTIxSUlJktVrVoEEDJSQkqEOHDpKkCRMmKCsrS8OHD1daWpqaN2+ub7/9Vv7+/rZ7TJ8+XZ6enurdu7eysrLUrl07zZ8/Xx4eHrY+8fHxGj16tG11d48ePTRz5swix8s+kgBuKuwjCZRcztxHssPMRIfde+XIFg67t7NRkQQAAG7PRQqSNx0SSQAA4PYcuf1PScaqbQAAAJhCRRIAALi9UhQkTaEiCQAAAFOoSAIAALfHHElzqEgCAADAFCqSAADA7VGQNIeKJAAAAEyhIgkAANyeRZQkzSCRBAAAbo/tf8xhaBsAAACmUJEEAABuj+1/zKEiCQAAAFOoSAIAALdHQdIcKpIAAAAwpVgqkqdPn1a5cuWK41YAAAA3XClKkqYUuSI5depULV682Pa5d+/eCg4OVqVKlfTTTz8Va3AAAABwXUVOJGfPnq2IiAhJ0sqVK7Vy5Up9/fXX6tKli8aPH1/sAQIAADiaxeK4oyQr8tB2SkqKLZFcvny5evfurY4dO6pq1apq3rx5sQcIAADgaGz/Y06RK5KBgYE6dOiQJCkhIUHt27eXJBmGodzc3OKNDgAAAC6ryBXJe+65R/369VOtWrV08uRJdenSRZKUlJSkmjVrFnuAAAAAjkZB0pwiJ5LTp09X1apVdejQIU2bNk1ly5aVdGnIe/jw4cUeIAAAAFxTkRNJLy8vjRs3Ll97bGxsccQDAABww7H9jzmFSiSXLl1a6Bv26NHDdDAAAAC4eRQqkezVq1ehbmaxWFhwAwAAbjrUI80pVCKZl5fn6DgAAABwk/lHP5F4/vx5+fr6FlcsAAAATsE+kuYUeR/J3NxcvfDCC6pUqZLKli2rP/74Q5L0zDPPaO7cucUeIAAAgKOVsjjuKMmKnEi+9NJLmj9/vqZNmyZvb29be/369fXee+8Va3AAAABwXUVOJD/44AO9++676t+/vzw8PGztDRo00C+//FKswQEAANwIFovFYUdJVuRE8q+//irwF2zy8vKUk5NTLEEBAADA9RU5kbz11lu1bt26fO3//e9/1ahRo2IJCgAA4EayWBx3lGRFXrU9adIkxcTE6K+//lJeXp4+//xz7d27Vx988IGWL1/uiBgBAADggopckezevbsWL16sFStWyGKx6Nlnn1VycrKWLVumDh06OCJGAAAAh2KOpDmm9pHs1KmTOnXqVNyxAAAA4CZiekPyrVu3Kjk5WRaLRXXq1FHjxo2LMy4AAIAbpqTv9+goRU4kDx8+rL59++rHH39UuXLlJEmnT59WdHS0Pv74Y0VERBR3jAAAAA5V0oegHaXIcyQHDRqknJwcJScn69SpUzp16pSSk5NlGIYGDx7siBgBAADggopckVy3bp02bNigyMhIW1tkZKTefPNNtWzZsliDAwAAuBGoR5pT5Ipk5cqVC9x4/OLFi6pUqVKxBAUAAADXV+REctq0aRo1apS2bt0qwzAkXVp489hjj+mVV14p9gABAAAcrZTF4rCjJCvU0HZgYKDdJNTMzEw1b95cnp6XLr948aI8PT01aNAg9erVyyGBAgAAwLUUKpGcMWOGg8MAAABwnhJeOHSYQiWSAwYMcHQcAAAAuMmY3pBckrKysvItvAkICPhHAQEAANxo7CNpTpEX22RmZmrkyJEKCQlR2bJlFRgYaHcAAADAPRQ5kZwwYYJWrVqlt99+Wz4+Pnrvvff03HPPKTw8XB988IEjYgQAAHAoi8VxR0lW5KHtZcuW6YMPPlDr1q01aNAg3XnnnapZs6aqVKmi+Ph49e/f3xFxAgAAOExJ36bHUYpckTx16pSqVasm6dJ8yFOnTkmS7rjjDv3www/FGx0AAABcVpETyerVq+vAgQOSpLp16+qTTz6RdKlSWa5cueKMDQAA4IZgaNucIieSDz30kH766SdJ0sSJE21zJR9//HGNHz++2AMEAABwF3FxcWratKn8/f0VEhKiXr16ae/evXZ9Bg4cKIvFYne0aNHCrk92drZGjRql8uXLq0yZMurRo4cOHz5s1yctLU0xMTGyWq2yWq2KiYnR6dOnixRvkedIPv7447Y/t2nTRr/88ou2bt2qGjVq6Lbbbivq7QAAAJzOVbb/Wbt2rUaMGKGmTZvq4sWLeuqpp9SxY0f9/PPPKlOmjK1f586dNW/ePNtnb29vu/vExsZq2bJlWrRokYKDgzV27Fh169ZN27Ztk4eHhySpX79+Onz4sBISEiRJQ4cOVUxMjJYtW1boeP/RPpKSVLlyZVWuXFmHDh3SoEGD9P777//TWwIAALily0ndZfPmzVNISIi2bdumu+66y9bu4+OjsLCwAu+Rnp6uuXPnauHChWrfvr0k6cMPP1RERIS+++47derUScnJyUpISFBiYqKaN28uSZozZ46ioqK0d+9eRUZGFiref5xIXnbq1CktWLDAJRLJtB//19khAHCQwKYjnR0CAAfJ2jHTac8u8ly/IsjOzlZ2drZdm4+Pj3x8fK57bXp6uiQpKCjIrn3NmjUKCQlRuXLl1KpVK7300ksKCQmRJG3btk05OTnq2LGjrX94eLjq1aunDRs2qFOnTtq4caOsVqstiZSkFi1ayGq1asOGDYVOJB35vQEAALi9uLg42zzEy0dcXNx1rzMMQ2PGjNEdd9yhevXq2dq7dOmi+Ph4rVq1Sq+++qq2bNmitm3b2pLV1NRUeXt75/uhmNDQUKWmptr6XE48/y4kJMTWpzCKrSIJAABws3LkHMmJEydqzJgxdm2FqUaOHDlSO3fu1Pr16+3a+/TpY/tzvXr11KRJE1WpUkVfffWV7rnnnqvezzAMu/cs6J2v7HM9JJIAAMDtlXLgWpvCDmP/3ahRo7R06VL98MMPuuWWW67Zt2LFiqpSpYr27dsnSQoLC9OFCxeUlpZmV5U8duyYoqOjbX2OHj2a717Hjx9XaGhooeMsdCJ5rQxXUpGXiwMAAMCeYRgaNWqUlixZojVr1th+BOZaTp48qUOHDqlixYqSpMaNG8vLy0srV65U7969JUkpKSnavXu3pk2bJkmKiopSenq6Nm/erGbNmkmSNm3apPT0dFuyWRiFTiStVut1zz/44IOFfjAAAICrcGRFsihGjBihjz76SF9++aX8/f1t8xWtVqv8/PyUkZGhyZMn695771XFihV14MABPfnkkypfvrz+/e9/2/oOHjxYY8eOVXBwsIKCgjRu3DjVr1/ftoq7Tp066ty5s4YMGaLZs2dLurT9T7du3Qq90EYqQiL5972KAAAAUPxmzZolSWrdurVd+7x58zRw4EB5eHho165d+uCDD3T69GlVrFhRbdq00eLFi+Xv72/rP336dHl6eqp3797KyspSu3btNH/+fNsekpIUHx+v0aNH21Z39+jRQzNnFm3lvMUwDMPku7qs8xedHQEAR2H7H6Dkcub2P2OX7b1+J5Ne7V74Ct/Nhu1/AAAAYAqrtgEAgNtzlTmSNxsqkgAAADCFiiQAAHB7DtyPvEQzVZFcuHChWrZsqfDwcP3555+SpBkzZujLL78s1uAAAABuhFIWi8OOkqzIieSsWbM0ZswYde3aVadPn1Zubq4kqVy5cpoxY0ZxxwcAAAAXVeRE8s0339ScOXP01FNP2e1F1KRJE+3atatYgwMAALgRSjnwKMmK/H779+9Xo0aN8rX7+PgoMzOzWIICAACA6ytyIlmtWjUlJSXla//6669Vt27d4ogJAADghrJYHHeUZEVetT1+/HiNGDFC58+fl2EY2rx5sz7++GPFxcXpvffec0SMAAAAcEFFTiQfeughXbx4URMmTNC5c+fUr18/VapUSa+//rruv/9+R8QIAADgUCV9dbWjmNpHcsiQIRoyZIhOnDihvLw8hYSEFHdcAAAAcHH/aEPy8uXLF1ccAAAATkNB0pwiJ5LVqlWT5Rrf9h9//PGPAgIAALjR+K1tc4qcSMbGxtp9zsnJ0Y4dO5SQkKDx48cXV1wAAABwcUVOJB977LEC29966y1t3br1HwcEAABwo7HYxpxi23C9S5cu+uyzz4rrdgAAAHBx/2ixzd99+umnCgoKKq7bAQAA3DAUJM0pciLZqFEju8U2hmEoNTVVx48f19tvv12swQEAAMB1FTmR7NWrl93nUqVKqUKFCmrdurX+9a9/FVdcAAAANwyrts0pUiJ58eJFVa1aVZ06dVJYWJijYgIAAMBNoEiLbTw9PfXoo48qOzvbUfEAAADccBYH/lOSFXnVdvPmzbVjxw5HxAIAAOAUpSyOO0qyIs+RHD58uMaOHavDhw+rcePGKlOmjN35Bg0aFFtwAAAAcF2FTiQHDRqkGTNmqE+fPpKk0aNH285ZLBYZhiGLxaLc3NzijxIAAMCBSnrl0FEKnUguWLBAL7/8svbv3+/IeAAAAHCTKHQiaRiGJKlKlSoOCwYAAMAZLOxIbkqRFtvwJQMAAOCyIi22qV279nWTyVOnTv2jgAAAAG405kiaU6RE8rnnnpPVanVULAAAALiJFCmRvP/++xUSEuKoWAAAAJyC2XvmFDqRZH4kAAAoqUqR55hS6MU2l1dtAwAAAFIRKpJ5eXmOjAMAAMBpWGxjTpF/axsAAACQTPzWNgAAQEnDFElzqEgCAADAFCqSAADA7ZUSJUkzqEgCAADAFCqSAADA7TFH0hwSSQAA4PbY/scchrYBAABgChVJAADg9viJRHOoSAIAAMAUKpIAAMDtUZA0h4okAAAATKEiCQAA3B5zJM2hIgkAAABTqEgCAAC3R0HSHBJJAADg9hiiNYfvDQAAAKaQSAIAALdnsVgcdhRFXFycmjZtKn9/f4WEhKhXr17au3evXR/DMDR58mSFh4fLz89PrVu31p49e+z6ZGdna9SoUSpfvrzKlCmjHj166PDhw3Z90tLSFBMTI6vVKqvVqpiYGJ0+fbpI8ZJIAgAAuIi1a9dqxIgRSkxM1MqVK3Xx4kV17NhRmZmZtj7Tpk3Ta6+9ppkzZ2rLli0KCwtThw4ddPbsWVuf2NhYLVmyRIsWLdL69euVkZGhbt26KTc319anX79+SkpKUkJCghISEpSUlKSYmJgixWsxDMP456/tWs5fdHYEABwlsOlIZ4cAwEGydsx02rM/2HrIYfd+sEmE6WuPHz+ukJAQrV27VnfddZcMw1B4eLhiY2P1xBNPSLpUfQwNDdXUqVP1yCOPKD09XRUqVNDChQvVp08fSdKRI0cUERGhFStWqFOnTkpOTlbdunWVmJio5s2bS5ISExMVFRWlX375RZGRkYWKj4okAACAA2VnZ+vMmTN2R3Z2dqGuTU9PlyQFBQVJkvbv36/U1FR17NjR1sfHx0etWrXShg0bJEnbtm1TTk6OXZ/w8HDVq1fP1mfjxo2yWq22JFKSWrRoIavVautTGCSSAADA7ZWyWBx2xMXF2eYhXj7i4uKuG5NhGBozZozuuOMO1atXT5KUmpoqSQoNDbXrGxoaajuXmpoqb29vBQYGXrNPSEhIvmeGhITY+hQG2/8AAAA40MSJEzVmzBi7Nh8fn+teN3LkSO3cuVPr16/Pd+7KRTyGYVx3Yc+VfQrqX5j7/B0VSQAA4PYsDjx8fHwUEBBgd1wvkRw1apSWLl2q1atX65ZbbrG1h4WFSVK+quGxY8dsVcqwsDBduHBBaWlp1+xz9OjRfM89fvx4vmrntZBIAgAAt2exOO4oCsMwNHLkSH3++edatWqVqlWrZne+WrVqCgsL08qVK21tFy5c0Nq1axUdHS1Jaty4sby8vOz6pKSkaPfu3bY+UVFRSk9P1+bNm219Nm3apPT0dFufwmBoGwAAwEWMGDFCH330kb788kv5+/vbKo9Wq1V+fn6yWCyKjY3VlClTVKtWLdWqVUtTpkxR6dKl1a9fP1vfwYMHa+zYsQoODlZQUJDGjRun+vXrq3379pKkOnXqqHPnzhoyZIhmz54tSRo6dKi6detW6BXbEokkAABAkTcOd5RZs2ZJklq3bm3XPm/ePA0cOFCSNGHCBGVlZWn48OFKS0tT8+bN9e2338rf39/Wf/r06fL09FTv3r2VlZWldu3aaf78+fLw8LD1iY+P1+jRo22ru3v06KGZM4u2BRP7SAK4qbCPJFByOXMfyY93/OWwe/dtVMlh93Y2KpIAAMDtsWjEHL43AAAAmEJFEgAAuD1XmSN5s6EiCQAAAFOoSAIAALdHPdIcKpIAAAAwhYokAABwe8yRNIdEEgAAuD2GaM3hewMAAIApVCQBAIDbY2jbHCqSAAAAMIWKJAAAcHvUI82hIgkAAABTqEgCAAC3xxRJc6hIAgAAwBQqkgAAwO2VYpakKSSSAADA7TG0bQ5D2wAAADDFpRLJCxcuaO/evbp48aKzQwEAAG7E4sB/SjKXSCTPnTunwYMHq3Tp0rr11lt18OBBSdLo0aP18ssvOzk6AAAAFMQlEsmJEyfqp59+0po1a+Tr62trb9++vRYvXuzEyAAAgDuwWBx3lGQusdjmiy++0OLFi9WiRQu737qsW7eufv/9dydGBgAAgKtxiUTy+PHjCgkJydeemZnJj6gDAACHY/sfc1xiaLtp06b66quvbJ8vJ49z5sxRVFSUs8ICAADANbhERTIuLk6dO3fWzz//rIsXL+r111/Xnj17tHHjRq1du9bZ4QEAgBKOAVBzXKIiGR0drR9//FHnzp1TjRo19O233yo0NFQbN25U48aNnR0eAAAo4VhsY45LVCQlqX79+lqwYIGzwwAAAEAhuURFsk2bNpo7d67S09OdHQoAAHBDbEhujkskkvXr19fTTz+tsLAw3Xvvvfriiy904cIFZ4cFAACAa3CJRPKNN97QX3/9pS+//FL+/v4aMGCAwsLCNHToUBbbAAAAhytlcdxRkrlEIilJpUqVUseOHTV//nwdPXpUs2fP1ubNm9W2bVtnhwYAAIACuMxim8tSU1O1aNEiffjhh9q5c6eaNm3q7JAAAEAJV9LnMjqKS1Qkz5w5o3nz5qlDhw6KiIjQrFmz1L17d/3666/atGmTs8MDAABAAVyiIhkaGqrAwED17t1bU6ZMoQoJAABuqJK+36OjuEQi+eWXX6p9+/YqVcolCqQAAMDNMLRtjkskkh07dnR2CAAAACgipyWSt99+u77//nsFBgaqUaNGslyjprx9+/YbGBkAAHA3JX2bHkdxWiLZs2dP+fj42P58rUQSAAAArsdiGIbh7CCK2/mLzo4AgKMENh3p7BAAOEjWjplOe/a6X9Mcdu87awc67N7O5hKrW6pXr66TJ0/maz99+rSqV6/uhIgAAABwPS6x2ObAgQPKzc3N156dna3Dhw87ISIAAOBOmGFnjlMTyaVLl9r+/M0338hqtdo+5+bm6vvvv1e1atWcERoAAACuw6mJZK9evSRJFotFAwYMsDvn5eWlqlWr6tVXX3VCZAAAwJ1QkDTHqYlkXl6eJKlatWrasmWLypcv78xwAACAmyrF2LYpLjFHcv/+/aavzc7OVnZ2tl2b4eFj21oIAAAAjuESiaQkZWZmau3atTp48KAuXLhgd2706NFXvS4uLk7PPfecXdtTz0zS089OdkSYAACgBKIeaY5L7CO5Y8cOde3aVefOnVNmZqaCgoJ04sQJlS5dWiEhIfrjjz+uei0VScC9sI8kUHI5cx/JxN9OO+zeLWqWc9i9nc0l9pF8/PHH1b17d506dUp+fn5KTEzUn3/+qcaNG+uVV1655rU+Pj4KCAiwO0giAQBAkVgceJRgLpFIJiUlaezYsfLw8JCHh4eys7MVERGhadOm6cknn3R2eAAAACiASySSXl5ett/aDg0N1cGDByVJVqvV9mcAAABHsTjwn5LMJRbbNGrUSFu3blXt2rXVpk0bPfvsszpx4oQWLlyo+vXrOzs8AAAAFMAlKpJTpkxRxYoVJUkvvPCCgoOD9eijj+rYsWN69913nRwdAAAo6SwWxx0lmUskkk2aNFGbNm0kSRUqVNCKFSt05swZbd++XbfddpuTowMAACWdK621+eGHH9S9e3eFh4fLYrHoiy++sDs/cOBAWSwWu6NFixZ2fbKzszVq1CiVL19eZcqUUY8ePXT48GG7PmlpaYqJiZHVapXValVMTIxOnz5dpFhdIpEEAADAJZmZmbrttts0c+bVt0Pq3LmzUlJSbMeKFSvszsfGxmrJkiVatGiR1q9fr4yMDHXr1k25ubm2Pv369VNSUpISEhKUkJCgpKQkxcTEFClWl5kjaSmg9muxWOTr66uaNWtq4MCBtqolAABAsXKhIeguXbqoS5cu1+zj4+OjsLCwAs+lp6dr7ty5Wrhwodq3by9J+vDDDxUREaHvvvtOnTp1UnJyshISEpSYmKjmzZtLkubMmaOoqCjt3btXkZGRhYrVJSqSnTt31h9//KEyZcqoTZs2at26tcqWLavff/9dTZs2VUpKitq3b68vv/zS2aECAAAUSXZ2ts6cOWN3XPljKkW1Zs0ahYSEqHbt2hoyZIiOHTtmO7dt2zbl5OSoY8eOtrbw8HDVq1dPGzZskCRt3LhRVqvVlkRKUosWLWS1Wm19CsMlEskTJ05o7NixWrdunV599VW99tpr+uGHHzRu3DhlZmbq22+/1dNPP60XXnjB2aECAIASyJHb/8TFxdnmIV4+4uLiTMfapUsXxcfHa9WqVXr11Ve1ZcsWtW3b1pacpqamytvbW4GBgXbXhYaGKjU11dYnJCQk371DQkJsfQrDJYa2P/nkE23bti1f+/3336/GjRtrzpw56tu3r1577TUnRAcAAGDexIkTNWbMGLu2f/IrfH369LH9uV69emrSpImqVKmir776Svfcc89VrzMMw24qYUHTCq/scz0uUZH09fUtsIy6YcMG+fr6SpLy8vL46UMAAOAQjtz+x9E/51yxYkVVqVJF+/btkySFhYXpwoULSktLs+t37NgxhYaG2vocPXo0372OHz9u61MYLlGRHDVqlIYNG6Zt27apadOmslgs2rx5s9577z3bTyR+8803atSokZMjBQAAcC0nT57UoUOHbHtyN27cWF5eXlq5cqV69+4tSUpJSdHu3bs1bdo0SVJUVJTS09O1efNmNWvWTJK0adMmpaenKzo6utDPthiGYRTz+5gSHx+vmTNnau/evZKkyMhIjRo1Sv369ZMkZWVl2VZxX8/5iw4NFYATBTYd6ewQADhI1o6rb3fjaNsPnHHYvW+vGlCk/hkZGfrtt98kXdrZ5rXXXlObNm0UFBSkoKAgTZ48Wffee68qVqyoAwcO6Mknn9TBgweVnJwsf39/SdKjjz6q5cuXa/78+QoKCtK4ceN08uRJbdu2TR4eHpIuzbU8cuSIZs+eLUkaOnSoqlSpomXLlhU6VpdJJIsTiSRQcpFIAiWXUxPJPx2YSFYpWiK5Zs2aArc8HDBggGbNmqVevXppx44dOn36tCpWrKg2bdrohRdeUEREhK3v+fPnNX78eH300UfKyspSu3bt9Pbbb9v1OXXqlEaPHq2lS5dKknr06KGZM2eqXLlyhY7VZRLJ06dP69NPP9Uff/yhcePGKSgoSNu3b1doaKgqVapUpHuRSAIlF4kkUHKRSN58XGKO5M6dO9W+fXtZrVYdOHBADz/8sIKCgrRkyRL9+eef+uCDD5wdIgAAKMEsrrQj+U3EJVZtjxkzRgMHDtS+ffvs5kB26dJFP/zwgxMjAwAAwNW4REVyy5Yttomef1epUqUibYoJAABgRhG2TsTfuERF0tfXV2fO5J+bsHfvXlWoUMEJEQEAAOB6XCKR7Nmzp55//nnl5ORIurTT+sGDB/U///M/uvfee50cHQAAKOksDjxKMpdIJF955RUdP35cISEhysrKUqtWrVSzZk2VLVtWL730krPDAwAAQAFcYo5kQECA1q9fr9WrV2vbtm3Ky8vT7bffrvbt2zs7NAAA4A5KeunQQVwikZSk77//Xt9//72OHTumvLw8/fLLL/roo48kSe+//76TowMAACUZ2/+Y4xKJ5HPPPafnn39eTZo0UcWKFWVh6RQAAIDLc4lE8p133tH8+fMVExPj7FAAAIAbooZljksstrlw4YKio6OdHQYAAACKwCUSyYcfftg2HxIAAOBGY/sfc1xiaPv8+fN699139d1336lBgwby8vKyO//aa685KTIAAABcjUskkjt37lTDhg0lSbt377Y7x8IbAADgcKQbprhEIrl69WpnhwAAAIAicolEEgAAwJnYR9Icl1hsAwAAgJsPFUkAAOD2WJJhDokkAABwe+SR5jC0DQAAAFOoSAIAAFCSNIWKJAAAAEyhIgkAANwe2/+YQ0USAAAAplCRBAAAbo/tf8yhIgkAAABTqEgCAAC3R0HSHBJJAAAAMklTGNoGAACAKVQkAQCA22P7H3OoSAIAAMAUKpIAAMDtsf2POVQkAQAAYAoVSQAA4PYoSJpDRRIAAACmUJEEAACgJGkKiSQAAHB7bP9jDkPbAAAAMIWKJAAAcHts/2MOFUkAAACYQkUSAAC4PQqS5lCRBAAAgClUJAEAAChJmkJFEgAAAKZQkQQAAG6PfSTNIZEEAABuj+1/zGFoGwAAAKZQkQQAAG6PgqQ5VCQBAABgChVJAADg9pgjaQ4VSQAAAJhCIgkAACCLA4+i+eGHH9S9e3eFh4fLYrHoiy++sDtvGIYmT56s8PBw+fn5qXXr1tqzZ49dn+zsbI0aNUrly5dXmTJl1KNHDx0+fNiuT1pammJiYmS1WmW1WhUTE6PTp08XKVYSSQAAABeSmZmp2267TTNnzizw/LRp0/Taa69p5syZ2rJli8LCwtShQwedPXvW1ic2NlZLlizRokWLtH79emVkZKhbt27Kzc219enXr5+SkpKUkJCghIQEJSUlKSYmpkixWgzDMMy9pus6f9HZEQBwlMCmI50dAgAHydpRcOJ0I/x1+oLD7l2pnLfpay0Wi5YsWaJevXpJulSNDA8PV2xsrJ544glJl6qPoaGhmjp1qh555BGlp6erQoUKWrhwofr06SNJOnLkiCIiIrRixQp16tRJycnJqlu3rhITE9W8eXNJUmJioqKiovTLL78oMjKyUPFRkQQAAG7PkQPb2dnZOnPmjN2RnZ1tKs79+/crNTVVHTt2tLX5+PioVatW2rBhgyRp27ZtysnJsesTHh6uevXq2fps3LhRVqvVlkRKUosWLWS1Wm19CoNEEgAAwIHi4uJs8xAvH3FxcabulZqaKkkKDQ21aw8NDbWdS01Nlbe3twIDA6/ZJyQkJN/9Q0JCbH0Kg+1/AACA23Pk9j8TJ07UmDFj7Np8fHz+0T0tVwRsGEa+titd2aeg/oW5z99RkQQAAHAgHx8fBQQE2B1mE8mwsDBJylc1PHbsmK1KGRYWpgsXLigtLe2afY4ePZrv/sePH89X7bwWEkkAAOD2LA78pzhVq1ZNYWFhWrlypa3twoULWrt2raKjoyVJjRs3lpeXl12flJQU7d6929YnKipK6enp2rx5s63Ppk2blJ6ebutTGAxtAwAAuJCMjAz99ttvts/79+9XUlKSgoKCVLlyZcXGxmrKlCmqVauWatWqpSlTpqh06dLq16+fJMlqtWrw4MEaO3asgoODFRQUpHHjxql+/fpq3769JKlOnTrq3LmzhgwZotmzZ0uShg4dqm7duhV6xbZEIgkAAGBm33CH2bp1q9q0aWP7fHl+5YABAzR//nxNmDBBWVlZGj58uNLS0tS8eXN9++238vf3t10zffp0eXp6qnfv3srKylK7du00f/58eXh42PrEx8dr9OjRttXdPXr0uOrelVfDPpIAbirsIwmUXM7cRzL1TI7D7h0W4OWwezsbFUkAAOD2XKggeVMhkQQAAG7Pkdv/lGSs2gYAAIApVCQBAIDbK+5tetwFFUkAAACYQkUSAACAgqQpVCQBAABgChVJAADg9ihImkNFEgAAAKZQkQQAAG6PfSTNIZEEAABuj+1/zGFoGwAAAKZQkQQAAG6PoW1zqEgCAADAFBJJAAAAmEIiCQAAAFOYIwkAANwecyTNoSIJAAAAU6hIAgAAt8c+kuaQSAIAALfH0LY5DG0DAADAFCqSAADA7VGQNIeKJAAAAEyhIgkAAEBJ0hQqkgAAADCFiiQAAHB7bP9jDhVJAAAAmEJFEgAAuD32kTSHiiQAAABMoSIJAADcHgVJc0gkAQAAyCRNYWgbAAAAplCRBAAAbo/tf8yhIgkAAABTqEgCAAC3x/Y/5lCRBAAAgCkWwzAMZwcBmJWdna24uDhNnDhRPj4+zg4HQDHi7zfg+kgkcVM7c+aMrFar0tPTFRAQ4OxwABQj/n4Dro+hbQAAAJhCIgkAAABTSCQBAABgCokkbmo+Pj6aNGkSE/GBEoi/34DrY7ENAAAATKEiCQAAAFNIJAEAAGAKiSQAAABMIZHETWnNmjWyWCw6ffr0NftVrVpVM2bMuCExAXCeyZMnq2HDhs4OA3A7LLbBTenChQs6deqUQkNDZbFYNH/+fMXGxuZLLI8fP64yZcqodOnSzgkUQLGzWCxasmSJevXqZWvLyMhQdna2goODnRcY4IY8nR0AYIa3t7fCwsKu269ChQo3IBoAzla2bFmVLVvW2WEAboehbThM69atNXLkSI0cOVLlypVTcHCwnn76aV0ugqelpenBBx9UYGCgSpcurS5dumjfvn226//88091795dgYGBKlOmjG699VatWLFCkv3Q9po1a/TQQw8pPT1dFotFFotFkydPlmQ/tN23b1/df//9djHm5OSofPnymjdvniTJMAxNmzZN1atXl5+fn2677TZ9+umnDv6mgJtD69atNXr0aE2YMEFBQUEKCwuz/V2TpPT0dA0dOlQhISEKCAhQ27Zt9dNPP9nd48UXX1RISIj8/f318MMP63/+53/shqS3bNmiDh06qHz58rJarWrVqpW2b99uO1+1alVJ0r///W9ZLBbb578PbX/zzTfy9fXNN0IxevRotWrVyvZ5w4YNuuuuu+Tn56eIiAiNHj1amZmZ//h7AtwJiSQcasGCBfL09NSmTZv0xhtvaPr06XrvvfckSQMHDtTWrVu1dOlSbdy4UYZhqGvXrsrJyZEkjRgxQtnZ2frhhx+0a9cuTZ06tcCKQ3R0tGbMmKGAgAClpKQoJSVF48aNy9evf//+Wrp0qTIyMmxt33zzjTIzM3XvvfdKkp5++mnNmzdPs2bN0p49e/T444/rgQce0Nq1ax3x9QA3nQULFqhMmTLatGmTpk2bpueff14rV66UYRi6++67lZqaqhUrVmjbtm26/fbb1a5dO506dUqSFB8fr5deeklTp07Vtm3bVLlyZc2aNcvu/mfPntWAAQO0bt06JSYmqlatWuratavOnj0r6VKiKUnz5s1TSkqK7fPftW/fXuXKldNnn31ma8vNzdUnn3yi/v37S5J27dqlTp066Z577tHOnTu1ePFirV+/XiNHjnTI9waUWAbgIK1atTLq1Klj5OXl2dqeeOIJo06dOsavv/5qSDJ+/PFH27kTJ04Yfn5+xieffGIYhmHUr1/fmDx5coH3Xr16tSHJSEtLMwzDMObNm2dYrdZ8/apUqWJMnz7dMAzDuHDhglG+fHnjgw8+sJ3v27evcd999xmGYRgZGRmGr6+vsWHDBrt7DB482Ojbt2+R3x8oaVq1amXccccddm1NmzY1nnjiCeP77783AgICjPPnz9udr1GjhjF79mzDMAyjefPmxogRI+zOt2zZ0rjtttuu+syLFy8a/v7+xrJly2xtkowlS5bY9Zs0aZLdfUaPHm20bdvW9vmbb74xvL29jVOnThmGYRgxMTHG0KFD7e6xbt06o1SpUkZWVtZV4wFgj4okHKpFixayWCy2z1FRUdq3b59+/vlneXp6qnnz5rZzwcHBioyMVHJysqRLw1AvvviiWrZsqUmTJmnnzp3/KBYvLy/dd999io+PlyRlZmbqyy+/tFUofv75Z50/f14dOnSwzbcqW7asPvjgA/3+++//6NlASdGgQQO7zxUrVtSxY8e0bds2ZWRkKDg42O7vz/79+21/f/bu3atmzZrZXX/l52PHjmnYsGGqXbu2rFarrFarMjIydPDgwSLF2b9/f61Zs0ZHjhyRdKka2rVrVwUGBkqStm3bpvnz59vF2qlTJ+Xl5Wn//v1FehbgzlhsA5diGIYt8Xz44YfVqVMnffXVV/r2228VFxenV199VaNGjTJ9//79+6tVq1Y6duyYVq5cKV9fX3Xp0kWSlJeXJ0n66quvVKlSJbvr+K1f4BIvLy+7zxaLRXl5ecrLy1PFihW1Zs2afNeUK1fOrv/fGVdsHDJw4EAdP35cM2bMUJUqVeTj46OoqChduHChSHE2a9ZMNWrU0KJFi/Too49qyZIltrnQ0qW/74888ohGjx6d79rKlSsX6VmAOyORhEMlJibm+1yrVi3VrVtXFy9e1KZNmxQdHS1JOnnypH799VfVqVPH1j8iIkLDhg3TsGHDNHHiRM2ZM6fARNLb21u5ubnXjSc6OloRERFavHixvv76a913333y9vaWJNWtW1c+Pj46ePCg3YR8ANd3++23KzU1VZ6enrYFMFeKjIzU5s2bFRMTY2vbunWrXZ9169bp7bffVteuXSVJhw4d0okTJ+z6eHl5Ferve79+/RQfH69bbrlFpUqV0t13320X7549e1SzZs3CviKAAjC0DYc6dOiQxowZo7179+rjjz/Wm2++qccee0y1atVSz549NWTIEK1fv14//fSTHnjgAVWqVEk9e/aUJMXGxuqbb77R/v37tX37dq1atcouyfy7qlWrKiMjQ99//71OnDihc+fOFdjPYrGoX79+euedd7Ry5Uo98MADtnP+/v4aN26cHn/8cS1YsEC///67duzYobfeeksLFiwo/i8HKEHat2+vqKgo9erVS998840OHDigDRs26Omnn7Yli6NGjdLcuXO1YMEC7du3Ty+++KJ27txpV6WsWbOmFi5cqOTkZG3atEn9+/eXn5+f3bOqVq2q77//XqmpqUpLS7tqTP3799f27dv10ksv6T//+Y98fX1t55544glt3LhRI0aMUFJSkvbt26elS5f+oxEPwB2RSMKhHnzwQWVlZalZs2YaMWKERo0apaFDh0q6tOqycePG6tatm6KiomQYhlasWGEbOsvNzdWIESNUp04dde7cWZGRkXr77bcLfE50dLSGDRumPn36qEKFCpo2bdpVY+rfv79+/vlnVapUSS1btrQ798ILL+jZZ59VXFyc6tSpo06dOmnZsmWqVq1aMX0jQMlksVi0YsUK3XXXXRo0aJBq166t+++/XwcOHFBoaKikS3/3Jk6cqHHjxun222/X/v37NXDgQLsE7/3331daWpoaNWqkmJgYjR49WiEhIXbPevXVV7Vy5UpFRESoUaNGV42pVq1aatq0qXbu3GmbC31ZgwYNtHbtWu3bt0933nmnGjVqpGeeeUYVK1Ysxm8FKPn4ZRs4TOvWrdWwYUN+ohDAVXXo0EFhYWFauHChs0MBYAJzJAEAN8S5c+f0zjvvqFOnTvLw8NDHH3+s7777TitXrnR2aABMIpEEANwQl4e/X3zxRWVnZysyMlKfffaZ2rdv7+zQAJjE0DYAAABMYbENAAAATCGRBAAAgCkkkgAAADCFRBIAAACmkEgCAADAFBJJAKZNnjxZDRs2tH0eOHCgevXqdcPjOHDggCwWi5KSkhz2jCvf1YwbEScA3EgkkkAJM3DgQFksFlksFnl5eal69eoaN26cMjMzHf7s119/XfPnzy9U3xudVLVu3VqxsbE35FkA4C7YkBwogTp37qx58+YpJydH69at08MPP6zMzEzNmjUrX9+cnBzb75v/U1artVjuAwC4OVCRBEogHx8fhYWFKSIiQv369VP//v31xRdfSPq/Idr3339f1atXl4+PjwzDUHp6uoYOHaqQkBAFBASobdu2+umnn+zu+/LLLys0NFT+/v4aPHiwzp8/b3f+yqHtvLw8TZ06VTVr1pSPj48qV66sl156SZJUrVo1SVKjRo1ksVjUunVr23Xz5s1TnTp15Ovrq3/96196++237Z6zefNmNWrUSL6+vmrSpIl27Njxj7+zJ554QrVr11bp0qVVvXp1PfPMM8rJycnXb/bs2YqIiFDp0qV133336fTp03bnrxf736Wlpal///6qUKGC/Pz8VKtWLc2bN+8fvwsA3ChUJAE34OfnZ5cU/fbbb/rkk0/02WefycPDQ5J09913KygoSCtWrJDVatXs2bPVrl07/frrrwoKCtInn3yiSZMm6a233tKdd96phQsX6o033lD16tWv+tyJEydqzpw5mj59uu644w6lpKTol19+kXQpGWzWrJm+++473XrrrfL29pYkzZkzR5MmTdLMmTPVqFEj7dixQ0OGDFGZMmU0YMAAZWZmqlu3bmrbtq0+/PBD7d+/X4899tg//o78/f01f/58hYeHa9euXRoyZIj8/f01YcKEfN/bsmXLdObMGQ0ePFgjRoxQfHx8oWK/0jPPPKOff/5ZX3/9tcqXL6/ffvtNWVlZ//hdAOCGMQCUKAMGDDB69uxp+7xp0yYjODjY6N27t2EYhjFp0iTDy8vLOHbsmK3P999/bwQEBBjnz5+3u1eNGjWM2bNnG4ZhGFFRUcawYcPszjdv3ty47bbbCnz2mTNnDB8fH2POnDkFxrl//35DkrFjxw679oiICOOjjz6ya3vhhReMqKgowzAMY/bs2UZQUJCRmZlpOz9r1qwC7/V3rVq1Mh577LGrnr/StGnTjMaNG9s+T5o0yfDw8DAOHTpka/v666+NUqVKGSkpKYWK/cp37t69u/HQQw8VOiYAcDVUJIESaPny5SpbtqwuXryonJwc9ezZU2+++abtfJUqVVShQgXb523btikjI0PBwcF298nKytLvv/8uSUpOTtawYcPszkdFRWn16tUFxpCcnKzs7Gy1a9eu0HEfP35chw4d0uDBgzVkyBBb+8WLF23zL5OTk3XbbbepdOnSdnH8U59++qlmzJih3377TRkZGbp48aICAgLs+lSuXFm33HKL3XPz8vK0d+9eeXh4XDf2Kz366KO69957tX37dnXs2FG9evVSdHT0P34XALhRSCSBEqhNmzaaNWuWvLy8FB4enm8xTZkyZew+5+XlqWLFilqzZk2+e5UrV85UDH5+fkW+Ji8vT9KlIeLmzZvbnbs8BG8Yhql4riUxMVH333+/nnvuOXXq1ElWq1WLFi3Sq6++es3rLBaL7f8WJvYrdenSRX/++ae++uorfffdd2rXrp1GjBihV155pRjeCgAcj0QSKIHKlCmjmjVrFrr/7bffrtTUVHl6eqpq1aoF9qlTp44SExP14IMP2toSExOves9atWrJz89P33//vR5++OF85y/PiczNzbW1hYaGqlKlSvrjjz/Uv3//Au9bt25dLVy4UFlZWbZk9VpxFMaPP/6oKlWq6KmnnrK1/fnnn/n6HTx4UEeOHFF4eLgkaePGjSpVqpRq165dqNgLUqFCBQ0cOFADBw7UnXfeqfHjx5NIArhpkEgCUPv27RUVFaVevXpp6tSpioyM1JEjR7RixQr16tVLTZo00WOPPaYBAwaoSZMmuuOOOxQfH689e/ZcdbGNr6+vnnjiCU2YMEHe3t5q2bKljh8/rj179mjw4MEKCQmRn5+fEhISdMstt8jX11dWq1WTJ0/W6NGjFRAQoC5duig7O1tbt25VWlqaxowZo379+umpp57S4MGD9fTTT+vAgQOFTryOHz+eb9/KsLAw1axZUwcPHtSiRYvUtGlTffXVV1qyZEmB7zRgwAC98sorOnPmjEaPHq3evXsrLCxMkq4b+5WeffZZNW7cWLfeequys7O1fPly1alTp1DvAgAuwdmTNAEUrysX21xp0qRJdgtkLjtz5owxatQoIzw83PDy8jIiIiKM/v37GwcPHrT1eemll4zy5csbZcuWNQYMGGBMmDDhqottDMMwcnNzjRdffNGoUqWK4eXlZVSuXNmYMmWK7fycOXOMiIgIo1SpUkarVq1s7fHx8UbDhg0Nb29vIzAw0LjrrruMzz//3HZ+48aNxm233WZ4e3sbDRs2ND777LNCLbaRlO+YNGmSYRiGMX78eCM4ONgoW7as0adPH2P69OmG1WrN9729/fbbRnh4uOHr62vcc889xqlTp+yec63Yr1xs88ILLxh16tQx/Pz8jKCgIKNnz57GH3/8cdV3AABXYzEMB0w4AgAAQInHhuQAAAAwhUQSAAAAppBIAgAAwBQSSQAAAJhCIgkAAABTSCQBAABgCokkAAAATCGRBAAAgCkkkgAAADCFRBIAAACmkEgCAADAlP8Hqf7JB2esBkAAAAAASUVORK5CYII=",
681
+ "text/plain": [
682
+ "<Figure size 800x600 with 2 Axes>"
683
+ ]
684
+ },
685
+ "metadata": {},
686
+ "output_type": "display_data"
687
+ }
688
+ ],
689
+ "source": [
690
+ "plot_confusion_matrix(y_test, y_pred, ['positive', 'negative'], 'NB')"
691
+ ]
692
+ },
693
+ {
694
+ "cell_type": "code",
695
+ "execution_count": 12,
696
+ "metadata": {
697
+ "colab": {
698
+ "base_uri": "https://localhost:8080/"
699
+ },
700
+ "id": "2580FJCGs_oQ",
701
+ "outputId": "118f79e2-6b57-4cc0-a631-c2ef8a7e317e"
702
+ },
703
+ "outputs": [
704
+ {
705
+ "name": "stdout",
706
+ "output_type": "stream",
707
+ "text": [
708
+ "Classification Report NB:\n",
709
+ " precision recall f1-score support\n",
710
+ "\n",
711
+ " negative 0.86 0.87 0.86 5017\n",
712
+ " positive 0.87 0.86 0.86 4983\n",
713
+ "\n",
714
+ " accuracy 0.86 10000\n",
715
+ " macro avg 0.86 0.86 0.86 10000\n",
716
+ "weighted avg 0.86 0.86 0.86 10000\n",
717
+ "\n"
718
+ ]
719
+ }
720
+ ],
721
+ "source": [
722
+ "# Imprimir as métricas de avaliação\n",
723
+ "print_evaluation_metrics(y_test, y_pred, 'NB')"
724
+ ]
725
+ },
726
+ {
727
+ "cell_type": "markdown",
728
+ "metadata": {
729
+ "id": "x0JBy6nXvdjC"
730
+ },
731
+ "source": [
732
+ "# ConclusΓ£o\n",
733
+ "\n",
734
+ "Γ‰ possΓ­vel verificar no relatΓ³rio de classificação que precisΓ£o e recall estΓ£o variando entre 86 a 87%. A mΓ©trica **F1-Score** combina precisΓ£o e recall, possui valor de aproximadamente 86%, o que indica um bom equilΓ­brio entre precisΓ£o e recall. A **AcurΓ‘cia (accuracy)** geral do modelo Γ© de 86%, o que significa que ele classificou corretamente aproximadamente 86% de todos os exemplos no conjunto de teste.\n",
735
+ "\n",
736
+ "O modelo Naive Bayes com vetorização TF-IDF conseguiu alcançar uma precisão, recall e F1-Score bastante equilibrados para ambas as classes, com uma acurÑcia geral de 86%. Podemos afirmar que o modelo é capaz de fazer previsáes precisas em relação ao sentimento das revisáes. Assim, podemos afirmar que o modelo estatístico possui um desempenho consideravelmente superior em relação à abordagem simbólica.\n"
737
+ ]
738
+ }
739
+ ],
740
+ "metadata": {
741
+ "accelerator": "GPU",
742
+ "colab": {
743
+ "gpuType": "T4",
744
+ "provenance": []
745
+ },
746
+ "kernelspec": {
747
+ "display_name": "Python 3",
748
+ "name": "python3"
749
+ },
750
+ "language_info": {
751
+ "codemirror_mode": {
752
+ "name": "ipython",
753
+ "version": 3
754
+ },
755
+ "file_extension": ".py",
756
+ "mimetype": "text/x-python",
757
+ "name": "python",
758
+ "nbconvert_exporter": "python",
759
+ "pygments_lexer": "ipython3",
760
+ "version": "3.11.7"
761
+ }
762
+ },
763
+ "nbformat": 4,
764
+ "nbformat_minor": 0
765
+ }
notebooks_explicativos/Neural_Bert.ipynb ADDED
@@ -0,0 +1,1291 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {},
6
+ "source": [
7
+ "# SCC0633/SCC5908 - Processamento de Linguagem Natural\n",
8
+ "> **Docente:** Thiago Alexandre Salgueiro Pardo \\\\\n",
9
+ "> **EstagiΓ‘rio PAE:** Germano Antonio Zani Jorge\n",
10
+ "\n",
11
+ "\n",
12
+ "# Integrantes do Grupo: GPTrouxas\n",
13
+ "> AndrΓ© Guarnier De Mitri - 11395579 \\\\\n",
14
+ "> Daniel Carvalho - 10685702 \\\\\n",
15
+ "> Fernando - 11795342 \\\\\n",
16
+ "> Lucas Henrique Sant'Anna - 10748521 \\\\\n",
17
+ "> Magaly L Fujimoto - 4890582 \\\\\n"
18
+ ]
19
+ },
20
+ {
21
+ "cell_type": "markdown",
22
+ "metadata": {},
23
+ "source": [
24
+ "# Abordagem Neural usando BERT\n",
25
+ "![alt text](../imagens/BERT_TDIDF.png)"
26
+ ]
27
+ },
28
+ {
29
+ "cell_type": "markdown",
30
+ "metadata": {},
31
+ "source": [
32
+ "###"
33
+ ]
34
+ },
35
+ {
36
+ "cell_type": "markdown",
37
+ "metadata": {
38
+ "id": "6yecpJR0feeQ"
39
+ },
40
+ "source": [
41
+ "## Importando bibliotecas"
42
+ ]
43
+ },
44
+ {
45
+ "cell_type": "code",
46
+ "execution_count": 1,
47
+ "metadata": {
48
+ "id": "FAIvyZwodEtm"
49
+ },
50
+ "outputs": [],
51
+ "source": [
52
+ "import torch\n",
53
+ "import numpy as np\n",
54
+ "import matplotlib.pyplot as plt\n",
55
+ "import math\n",
56
+ "from tqdm.notebook import tqdm\n",
57
+ "import pandas as pd"
58
+ ]
59
+ },
60
+ {
61
+ "cell_type": "code",
62
+ "execution_count": 3,
63
+ "metadata": {},
64
+ "outputs": [],
65
+ "source": [
66
+ "#!pip install transformers seaborn nltk"
67
+ ]
68
+ },
69
+ {
70
+ "cell_type": "markdown",
71
+ "metadata": {},
72
+ "source": [
73
+ "## Carregando dados"
74
+ ]
75
+ },
76
+ {
77
+ "cell_type": "code",
78
+ "execution_count": 3,
79
+ "metadata": {
80
+ "colab": {
81
+ "base_uri": "https://localhost:8080/",
82
+ "height": 206
83
+ },
84
+ "id": "LYgXl3RIfgfo",
85
+ "outputId": "eb496faf-7826-44f7-fa88-3b21fb6e7cbf"
86
+ },
87
+ "outputs": [
88
+ {
89
+ "data": {
90
+ "text/html": [
91
+ "<div>\n",
92
+ "<style scoped>\n",
93
+ " .dataframe tbody tr th:only-of-type {\n",
94
+ " vertical-align: middle;\n",
95
+ " }\n",
96
+ "\n",
97
+ " .dataframe tbody tr th {\n",
98
+ " vertical-align: top;\n",
99
+ " }\n",
100
+ "\n",
101
+ " .dataframe thead th {\n",
102
+ " text-align: right;\n",
103
+ " }\n",
104
+ "</style>\n",
105
+ "<table border=\"1\" class=\"dataframe\">\n",
106
+ " <thead>\n",
107
+ " <tr style=\"text-align: right;\">\n",
108
+ " <th></th>\n",
109
+ " <th>review</th>\n",
110
+ " <th>sentiment</th>\n",
111
+ " </tr>\n",
112
+ " </thead>\n",
113
+ " <tbody>\n",
114
+ " <tr>\n",
115
+ " <th>0</th>\n",
116
+ " <td>One of the other reviewers has mentioned that ...</td>\n",
117
+ " <td>positive</td>\n",
118
+ " </tr>\n",
119
+ " <tr>\n",
120
+ " <th>1</th>\n",
121
+ " <td>A wonderful little production. &lt;br /&gt;&lt;br /&gt;The...</td>\n",
122
+ " <td>positive</td>\n",
123
+ " </tr>\n",
124
+ " <tr>\n",
125
+ " <th>2</th>\n",
126
+ " <td>I thought this was a wonderful way to spend ti...</td>\n",
127
+ " <td>positive</td>\n",
128
+ " </tr>\n",
129
+ " <tr>\n",
130
+ " <th>3</th>\n",
131
+ " <td>Basically there's a family where a little boy ...</td>\n",
132
+ " <td>negative</td>\n",
133
+ " </tr>\n",
134
+ " <tr>\n",
135
+ " <th>4</th>\n",
136
+ " <td>Petter Mattei's \"Love in the Time of Money\" is...</td>\n",
137
+ " <td>positive</td>\n",
138
+ " </tr>\n",
139
+ " </tbody>\n",
140
+ "</table>\n",
141
+ "</div>"
142
+ ],
143
+ "text/plain": [
144
+ " review sentiment\n",
145
+ "0 One of the other reviewers has mentioned that ... positive\n",
146
+ "1 A wonderful little production. <br /><br />The... positive\n",
147
+ "2 I thought this was a wonderful way to spend ti... positive\n",
148
+ "3 Basically there's a family where a little boy ... negative\n",
149
+ "4 Petter Mattei's \"Love in the Time of Money\" is... positive"
150
+ ]
151
+ },
152
+ "execution_count": 3,
153
+ "metadata": {},
154
+ "output_type": "execute_result"
155
+ }
156
+ ],
157
+ "source": [
158
+ "df_reviews = pd.read_csv('imdb_reviews.csv')\n",
159
+ "df_reviews.head()"
160
+ ]
161
+ },
162
+ {
163
+ "cell_type": "markdown",
164
+ "metadata": {},
165
+ "source": [
166
+ "## Mapeando as classes\n",
167
+ "- Sentimento positivo recebe label 1\n",
168
+ "- Sentimento negativo recebe label 0"
169
+ ]
170
+ },
171
+ {
172
+ "cell_type": "code",
173
+ "execution_count": 4,
174
+ "metadata": {
175
+ "colab": {
176
+ "base_uri": "https://localhost:8080/",
177
+ "height": 206
178
+ },
179
+ "id": "D-5n8XzJbWOO",
180
+ "outputId": "cef630cc-b0cc-4598-c53f-d32636bfcd86"
181
+ },
182
+ "outputs": [
183
+ {
184
+ "data": {
185
+ "text/html": [
186
+ "<div>\n",
187
+ "<style scoped>\n",
188
+ " .dataframe tbody tr th:only-of-type {\n",
189
+ " vertical-align: middle;\n",
190
+ " }\n",
191
+ "\n",
192
+ " .dataframe tbody tr th {\n",
193
+ " vertical-align: top;\n",
194
+ " }\n",
195
+ "\n",
196
+ " .dataframe thead th {\n",
197
+ " text-align: right;\n",
198
+ " }\n",
199
+ "</style>\n",
200
+ "<table border=\"1\" class=\"dataframe\">\n",
201
+ " <thead>\n",
202
+ " <tr style=\"text-align: right;\">\n",
203
+ " <th></th>\n",
204
+ " <th>review</th>\n",
205
+ " <th>sentiment</th>\n",
206
+ " </tr>\n",
207
+ " </thead>\n",
208
+ " <tbody>\n",
209
+ " <tr>\n",
210
+ " <th>0</th>\n",
211
+ " <td>One of the other reviewers has mentioned that ...</td>\n",
212
+ " <td>1</td>\n",
213
+ " </tr>\n",
214
+ " <tr>\n",
215
+ " <th>1</th>\n",
216
+ " <td>A wonderful little production. &lt;br /&gt;&lt;br /&gt;The...</td>\n",
217
+ " <td>1</td>\n",
218
+ " </tr>\n",
219
+ " <tr>\n",
220
+ " <th>2</th>\n",
221
+ " <td>I thought this was a wonderful way to spend ti...</td>\n",
222
+ " <td>1</td>\n",
223
+ " </tr>\n",
224
+ " <tr>\n",
225
+ " <th>3</th>\n",
226
+ " <td>Basically there's a family where a little boy ...</td>\n",
227
+ " <td>0</td>\n",
228
+ " </tr>\n",
229
+ " <tr>\n",
230
+ " <th>4</th>\n",
231
+ " <td>Petter Mattei's \"Love in the Time of Money\" is...</td>\n",
232
+ " <td>1</td>\n",
233
+ " </tr>\n",
234
+ " </tbody>\n",
235
+ "</table>\n",
236
+ "</div>"
237
+ ],
238
+ "text/plain": [
239
+ " review sentiment\n",
240
+ "0 One of the other reviewers has mentioned that ... 1\n",
241
+ "1 A wonderful little production. <br /><br />The... 1\n",
242
+ "2 I thought this was a wonderful way to spend ti... 1\n",
243
+ "3 Basically there's a family where a little boy ... 0\n",
244
+ "4 Petter Mattei's \"Love in the Time of Money\" is... 1"
245
+ ]
246
+ },
247
+ "execution_count": 4,
248
+ "metadata": {},
249
+ "output_type": "execute_result"
250
+ }
251
+ ],
252
+ "source": [
253
+ "def map_sentiments(sentiment):\n",
254
+ " if sentiment == 'positive':\n",
255
+ " return 1\n",
256
+ " return 0\n",
257
+ "\n",
258
+ "df_reviews['sentiment'] = df_reviews['sentiment'].apply(map_sentiments)\n",
259
+ "df_reviews.head()"
260
+ ]
261
+ },
262
+ {
263
+ "cell_type": "markdown",
264
+ "metadata": {},
265
+ "source": [
266
+ "# Funçáes para limpeza do texto\n",
267
+ "**lowercase_text(text)** Converte o texto para letras minΓΊsculas para uniformizar o texto.\n",
268
+ "\n",
269
+ "\n",
270
+ "**remove_html(text)** Remove quaisquer tags HTML do texto para limpar dados provenientes de fontes HTML.\n",
271
+ "\n",
272
+ "\n",
273
+ " **remove_url(text)** Remove URLs do texto para eliminar links que podem nΓ£o ser relevantes para a anΓ‘lise de texto.\n",
274
+ "\n",
275
+ "\n",
276
+ "**remove_punctuations(text)** Remove pontuaçáes do texto para simplificar a estrutura do texto, mantendo apenas palavras.\n",
277
+ "\n",
278
+ "**remove_emojis(text)** Remove emojis do texto para evitar caracteres nΓ£o verbais que podem interferir na anΓ‘lise textual.\n",
279
+ "\n",
280
+ "**remove_stop_words(text)** Remove stop words (palavras comuns como \"e\", \"de\", \"o\") que geralmente nΓ£o adicionam valor significativo Γ  anΓ‘lise de texto.\n",
281
+ "\n",
282
+ "**stem_words(text)** Aplica stemming nas palavras do texto, reduzindo-as à sua raiz (por exemplo, \"running\" vira \"run\") para normalizar as variaçáes das palavras.\n",
283
+ "\n",
284
+ "**preprocess_text(text)** Aplica todas as funçáes acima em sequΓͺncia para prΓ©-processar o texto de forma completa, tornando-o mais adequado para anΓ‘lise de texto ou modelagem.\n",
285
+ "\n",
286
+ "\n",
287
+ "\n"
288
+ ]
289
+ },
290
+ {
291
+ "cell_type": "code",
292
+ "execution_count": 5,
293
+ "metadata": {
294
+ "colab": {
295
+ "base_uri": "https://localhost:8080/",
296
+ "height": 241
297
+ },
298
+ "id": "PnFHO62rnWn-",
299
+ "outputId": "17fb6619-fab9-4395-de5d-4c5199e7e45e"
300
+ },
301
+ "outputs": [
302
+ {
303
+ "name": "stderr",
304
+ "output_type": "stream",
305
+ "text": [
306
+ "[nltk_data] Downloading package stopwords to\n",
307
+ "[nltk_data] C:\\Users\\andre\\AppData\\Roaming\\nltk_data...\n",
308
+ "[nltk_data] Package stopwords is already up-to-date!\n"
309
+ ]
310
+ },
311
+ {
312
+ "data": {
313
+ "text/html": [
314
+ "<div>\n",
315
+ "<style scoped>\n",
316
+ " .dataframe tbody tr th:only-of-type {\n",
317
+ " vertical-align: middle;\n",
318
+ " }\n",
319
+ "\n",
320
+ " .dataframe tbody tr th {\n",
321
+ " vertical-align: top;\n",
322
+ " }\n",
323
+ "\n",
324
+ " .dataframe thead th {\n",
325
+ " text-align: right;\n",
326
+ " }\n",
327
+ "</style>\n",
328
+ "<table border=\"1\" class=\"dataframe\">\n",
329
+ " <thead>\n",
330
+ " <tr style=\"text-align: right;\">\n",
331
+ " <th></th>\n",
332
+ " <th>review</th>\n",
333
+ " <th>sentiment</th>\n",
334
+ " </tr>\n",
335
+ " </thead>\n",
336
+ " <tbody>\n",
337
+ " <tr>\n",
338
+ " <th>0</th>\n",
339
+ " <td>one review mention watch 1 oz episod hook righ...</td>\n",
340
+ " <td>1</td>\n",
341
+ " </tr>\n",
342
+ " <tr>\n",
343
+ " <th>1</th>\n",
344
+ " <td>wonder littl product film techniqu unassum old...</td>\n",
345
+ " <td>1</td>\n",
346
+ " </tr>\n",
347
+ " <tr>\n",
348
+ " <th>2</th>\n",
349
+ " <td>thought wonder way spend time hot summer weeke...</td>\n",
350
+ " <td>1</td>\n",
351
+ " </tr>\n",
352
+ " <tr>\n",
353
+ " <th>3</th>\n",
354
+ " <td>basic famili littl boy jake think zombi closet...</td>\n",
355
+ " <td>0</td>\n",
356
+ " </tr>\n",
357
+ " <tr>\n",
358
+ " <th>4</th>\n",
359
+ " <td>petter mattei love time money visual stun film...</td>\n",
360
+ " <td>1</td>\n",
361
+ " </tr>\n",
362
+ " </tbody>\n",
363
+ "</table>\n",
364
+ "</div>"
365
+ ],
366
+ "text/plain": [
367
+ " review sentiment\n",
368
+ "0 one review mention watch 1 oz episod hook righ... 1\n",
369
+ "1 wonder littl product film techniqu unassum old... 1\n",
370
+ "2 thought wonder way spend time hot summer weeke... 1\n",
371
+ "3 basic famili littl boy jake think zombi closet... 0\n",
372
+ "4 petter mattei love time money visual stun film... 1"
373
+ ]
374
+ },
375
+ "execution_count": 5,
376
+ "metadata": {},
377
+ "output_type": "execute_result"
378
+ }
379
+ ],
380
+ "source": [
381
+ "import re\n",
382
+ "import nltk\n",
383
+ "from nltk.corpus import stopwords\n",
384
+ "from nltk.stem import PorterStemmer\n",
385
+ "\n",
386
+ "\n",
387
+ "def lowercase_text(text):\n",
388
+ " return text.lower()\n",
389
+ "\n",
390
+ "def remove_html(text):\n",
391
+ " return re.sub(r'<[^<]+?>', '', text)\n",
392
+ "\n",
393
+ "def remove_url(text):\n",
394
+ " return re.sub(r'http[s]?://\\S+|www\\.\\S+', '', text)\n",
395
+ "\n",
396
+ "def remove_punctuations(text):\n",
397
+ " tokens_list = '!\"#$%&\\'()*+,-./:;<=>?@[\\\\]^_`{|}~'\n",
398
+ " for char in text:\n",
399
+ " if char in tokens_list:\n",
400
+ " text = text.replace(char, ' ')\n",
401
+ "\n",
402
+ " return text\n",
403
+ "\n",
404
+ "def remove_emojis(text):\n",
405
+ " emojis = re.compile(\"[\"\n",
406
+ " u\"\\U0001F600-\\U0001F64F\"\n",
407
+ " u\"\\U0001F300-\\U0001F5FF\"\n",
408
+ " u\"\\U0001F680-\\U0001F6FF\"\n",
409
+ " u\"\\U0001F1E0-\\U0001F1FF\"\n",
410
+ " u\"\\U00002500-\\U00002BEF\"\n",
411
+ " u\"\\U00002702-\\U000027B0\"\n",
412
+ " u\"\\U00002702-\\U000027B0\"\n",
413
+ " u\"\\U000024C2-\\U0001F251\"\n",
414
+ " u\"\\U0001f926-\\U0001f937\"\n",
415
+ " u\"\\U00010000-\\U0010ffff\"\n",
416
+ " u\"\\u2640-\\u2642\"\n",
417
+ " u\"\\u2600-\\u2B55\"\n",
418
+ " u\"\\u200d\"\n",
419
+ " u\"\\u23cf\"\n",
420
+ " u\"\\u23e9\"\n",
421
+ " u\"\\u231a\"\n",
422
+ " u\"\\ufe0f\"\n",
423
+ " u\"\\u3030\"\n",
424
+ " \"]+\", re.UNICODE)\n",
425
+ "\n",
426
+ " text = re.sub(emojis, '', text)\n",
427
+ " return text\n",
428
+ "\n",
429
+ "def remove_stop_words(text):\n",
430
+ " stop_words = stopwords.words('english')\n",
431
+ " new_text = ''\n",
432
+ " for word in text.split():\n",
433
+ " if word not in stop_words:\n",
434
+ " new_text += ''.join(f'{word} ')\n",
435
+ "\n",
436
+ " return new_text.strip()\n",
437
+ "\n",
438
+ "def stem_words(text):\n",
439
+ " stemmer = PorterStemmer()\n",
440
+ " new_text = ''\n",
441
+ " for word in text.split():\n",
442
+ " new_text += ''.join(f'{stemmer.stem(word)} ')\n",
443
+ "\n",
444
+ " return new_text\n",
445
+ "\n",
446
+ "def preprocess_text(text):\n",
447
+ " text = lowercase_text(text)\n",
448
+ " text = remove_html(text)\n",
449
+ " text = remove_url(text)\n",
450
+ " text = remove_punctuations(text)\n",
451
+ " text = remove_emojis(text)\n",
452
+ " text = remove_stop_words(text)\n",
453
+ " text = stem_words(text)\n",
454
+ "\n",
455
+ " return text\n",
456
+ "\n",
457
+ "nltk.download('stopwords')\n",
458
+ "df_reviews['review'] = df_reviews['review'].apply(preprocess_text)\n",
459
+ "df_reviews.head()"
460
+ ]
461
+ },
462
+ {
463
+ "cell_type": "markdown",
464
+ "metadata": {},
465
+ "source": [
466
+ "### Visualizando balancemento da classes"
467
+ ]
468
+ },
469
+ {
470
+ "cell_type": "code",
471
+ "execution_count": 6,
472
+ "metadata": {
473
+ "colab": {
474
+ "base_uri": "https://localhost:8080/",
475
+ "height": 452
476
+ },
477
+ "id": "Gdi_L0HWfntv",
478
+ "outputId": "bce77594-f662-4b3f-c8eb-27d8a188b4f2"
479
+ },
480
+ "outputs": [
481
+ {
482
+ "data": {
483
+ "image/png": "",
484
+ "text/plain": [
485
+ "<Figure size 640x480 with 1 Axes>"
486
+ ]
487
+ },
488
+ "metadata": {},
489
+ "output_type": "display_data"
490
+ }
491
+ ],
492
+ "source": [
493
+ "plt.title('Target value distribution')\n",
494
+ "plt.hist(df_reviews['sentiment'])\n",
495
+ "plt.show()"
496
+ ]
497
+ },
498
+ {
499
+ "cell_type": "markdown",
500
+ "metadata": {},
501
+ "source": [
502
+ "# Modelo BERT"
503
+ ]
504
+ },
505
+ {
506
+ "cell_type": "markdown",
507
+ "metadata": {
508
+ "id": "EDkjlPDakskM"
509
+ },
510
+ "source": [
511
+ "## Instalando Bibliotecas"
512
+ ]
513
+ },
514
+ {
515
+ "cell_type": "code",
516
+ "execution_count": 4,
517
+ "metadata": {
518
+ "colab": {
519
+ "base_uri": "https://localhost:8080/"
520
+ },
521
+ "id": "lk7m_1xvmWvz",
522
+ "outputId": "ce842053-b261-4768-d9d7-fe9c65c9f6aa"
523
+ },
524
+ "outputs": [],
525
+ "source": [
526
+ "#pip install transformers\n",
527
+ "#pip install accelerate -U\n",
528
+ "#pip install transformers[torch]\n",
529
+ "#pip install datasets evaluate"
530
+ ]
531
+ },
532
+ {
533
+ "cell_type": "markdown",
534
+ "metadata": {},
535
+ "source": [
536
+ "## Carregando o modelo treinado e tokenizador"
537
+ ]
538
+ },
539
+ {
540
+ "cell_type": "code",
541
+ "execution_count": 10,
542
+ "metadata": {
543
+ "colab": {
544
+ "base_uri": "https://localhost:8080/"
545
+ },
546
+ "id": "GlyrkK52zMcc",
547
+ "outputId": "a938653b-92c3-4b4e-802c-eacc3f1b6ecf"
548
+ },
549
+ "outputs": [
550
+ {
551
+ "name": "stderr",
552
+ "output_type": "stream",
553
+ "text": [
554
+ "c:\\Users\\andre\\1JUPYTER\\dt_labs\\.venv\\Lib\\site-packages\\tqdm\\auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
555
+ " from .autonotebook import tqdm as notebook_tqdm\n"
556
+ ]
557
+ }
558
+ ],
559
+ "source": [
560
+ "from transformers import AutoTokenizer\n",
561
+ "from transformers import BertForSequenceClassification\n",
562
+ "\n",
563
+ "pre_trained_base = \"bert-base-uncased\"\n",
564
+ "tokenizer = AutoTokenizer.from_pretrained(pre_trained_base)\n",
565
+ "model = BertForSequenceClassification.from_pretrained(pre_trained_base, num_labels = 2, output_attentions=False, output_hidden_states=False)"
566
+ ]
567
+ },
568
+ {
569
+ "cell_type": "markdown",
570
+ "metadata": {},
571
+ "source": [
572
+ "### Tokenização das Sentenças e CÑlculo do Tamanho dos Tokens"
573
+ ]
574
+ },
575
+ {
576
+ "cell_type": "code",
577
+ "execution_count": 13,
578
+ "metadata": {
579
+ "id": "LKEjDZCHpk4e"
580
+ },
581
+ "outputs": [],
582
+ "source": [
583
+ "token_lens = []\n",
584
+ "\n",
585
+ "for sentence in df_reviews['review']:\n",
586
+ " tokens = tokenizer.encode(sentence, max_length=200, truncation=True)\n",
587
+ " token_lens.append(len(tokens))"
588
+ ]
589
+ },
590
+ {
591
+ "cell_type": "markdown",
592
+ "metadata": {},
593
+ "source": [
594
+ "### Divisão dos Dados em Conjunto de Treinamento e Validação:"
595
+ ]
596
+ },
597
+ {
598
+ "cell_type": "code",
599
+ "execution_count": 15,
600
+ "metadata": {
601
+ "id": "H7PfXaVVp2uQ"
602
+ },
603
+ "outputs": [],
604
+ "source": [
605
+ "SEED=42\n",
606
+ "MAX_LEN = 200\n",
607
+ "from sklearn.model_selection import train_test_split\n",
608
+ "df_train, df_val = train_test_split(df_reviews, test_size=0.2, random_state=SEED)"
609
+ ]
610
+ },
611
+ {
612
+ "cell_type": "markdown",
613
+ "metadata": {},
614
+ "source": [
615
+ "### Processando os dados\n",
616
+ "A função process_data recebe uma linha de um dataframe contendo uma revisΓ£o de texto e sua respectiva classificação de sentimento. Ela comeΓ§a extraindo e limpando o texto da revisΓ£o, removendo quaisquer espaΓ§os extras. Em seguida, utiliza o tokenizer BERT para tokenizar o texto, aplicando padding e truncamento para garantir que todas as sequΓͺncias tenham um comprimento fixo definido pela variΓ‘vel MAX_LEN. A função entΓ£o adiciona a etiqueta de sentimento original e o texto limpo Γ s codificaçáes geradas, retornando um dicionΓ‘rio que contΓ©m os tokens do texto, a etiqueta de sentimento e o texto original."
617
+ ]
618
+ },
619
+ {
620
+ "cell_type": "code",
621
+ "execution_count": 16,
622
+ "metadata": {
623
+ "id": "v7EZ6wd-qDfd"
624
+ },
625
+ "outputs": [],
626
+ "source": [
627
+ "def process_data(row):\n",
628
+ "\n",
629
+ " text = row['review']\n",
630
+ " text = str(text)\n",
631
+ " text = ' '.join(text.split())\n",
632
+ "\n",
633
+ " encodings = tokenizer(text, padding=\"max_length\", truncation=True, max_length=MAX_LEN)\n",
634
+ "\n",
635
+ " encodings['label'] = row['sentiment']\n",
636
+ " encodings['text'] = text\n",
637
+ "\n",
638
+ " return encodings"
639
+ ]
640
+ },
641
+ {
642
+ "cell_type": "code",
643
+ "execution_count": 17,
644
+ "metadata": {
645
+ "id": "d9VgrXNSqIYL"
646
+ },
647
+ "outputs": [],
648
+ "source": [
649
+ "# Treino\n",
650
+ "processed_data_tr = []\n",
651
+ "for i in range(df_train.shape[0]):\n",
652
+ " processed_data_tr.append(process_data(df_train.iloc[i]))"
653
+ ]
654
+ },
655
+ {
656
+ "cell_type": "code",
657
+ "execution_count": 18,
658
+ "metadata": {
659
+ "id": "p0NLQxoKqJ_k"
660
+ },
661
+ "outputs": [],
662
+ "source": [
663
+ "# Validação\n",
664
+ "processed_data_val = []\n",
665
+ "for i in range(df_val.shape[0]):\n",
666
+ " processed_data_val.append(process_data(df_val.iloc[i]))"
667
+ ]
668
+ },
669
+ {
670
+ "cell_type": "code",
671
+ "execution_count": 19,
672
+ "metadata": {
673
+ "id": "ac76Rb6fqP_G"
674
+ },
675
+ "outputs": [],
676
+ "source": [
677
+ "# Dataframes de Treino e Validação\n",
678
+ "df_train = pd.DataFrame(processed_data_tr)\n",
679
+ "df_val = pd.DataFrame(processed_data_val)"
680
+ ]
681
+ },
682
+ {
683
+ "cell_type": "code",
684
+ "execution_count": 20,
685
+ "metadata": {
686
+ "colab": {
687
+ "base_uri": "https://localhost:8080/",
688
+ "height": 206
689
+ },
690
+ "id": "RdbHaVy_fd64",
691
+ "outputId": "a9aed834-81b7-4223-da42-6289799c2e1e"
692
+ },
693
+ "outputs": [
694
+ {
695
+ "data": {
696
+ "text/html": [
697
+ "<div>\n",
698
+ "<style scoped>\n",
699
+ " .dataframe tbody tr th:only-of-type {\n",
700
+ " vertical-align: middle;\n",
701
+ " }\n",
702
+ "\n",
703
+ " .dataframe tbody tr th {\n",
704
+ " vertical-align: top;\n",
705
+ " }\n",
706
+ "\n",
707
+ " .dataframe thead th {\n",
708
+ " text-align: right;\n",
709
+ " }\n",
710
+ "</style>\n",
711
+ "<table border=\"1\" class=\"dataframe\">\n",
712
+ " <thead>\n",
713
+ " <tr style=\"text-align: right;\">\n",
714
+ " <th></th>\n",
715
+ " <th>attention_mask</th>\n",
716
+ " <th>input_ids</th>\n",
717
+ " <th>label</th>\n",
718
+ " <th>text</th>\n",
719
+ " <th>token_type_ids</th>\n",
720
+ " </tr>\n",
721
+ " </thead>\n",
722
+ " <tbody>\n",
723
+ " <tr>\n",
724
+ " <th>0</th>\n",
725
+ " <td>[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...</td>\n",
726
+ " <td>[101, 2921, 3198, 23624, 2954, 6978, 2674, 841...</td>\n",
727
+ " <td>0</td>\n",
728
+ " <td>kept ask mani fight scream match swear gener m...</td>\n",
729
+ " <td>[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...</td>\n",
730
+ " </tr>\n",
731
+ " <tr>\n",
732
+ " <th>1</th>\n",
733
+ " <td>[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...</td>\n",
734
+ " <td>[101, 3422, 4372, 3775, 2099, 9587, 5737, 2071...</td>\n",
735
+ " <td>0</td>\n",
736
+ " <td>watch entir movi could watch entir movi stop d...</td>\n",
737
+ " <td>[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...</td>\n",
738
+ " </tr>\n",
739
+ " <tr>\n",
740
+ " <th>2</th>\n",
741
+ " <td>[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...</td>\n",
742
+ " <td>[101, 3543, 2293, 2358, 10050, 2128, 25300, 11...</td>\n",
743
+ " <td>1</td>\n",
744
+ " <td>touch love stori reminisc Β‘in mood love draw h...</td>\n",
745
+ " <td>[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...</td>\n",
746
+ " </tr>\n",
747
+ " <tr>\n",
748
+ " <th>3</th>\n",
749
+ " <td>[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...</td>\n",
750
+ " <td>[101, 3732, 2154, 11865, 15472, 2072, 8040, 73...</td>\n",
751
+ " <td>0</td>\n",
752
+ " <td>latter day fulci schlocker total abysm concoct...</td>\n",
753
+ " <td>[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...</td>\n",
754
+ " </tr>\n",
755
+ " <tr>\n",
756
+ " <th>4</th>\n",
757
+ " <td>[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...</td>\n",
758
+ " <td>[101, 2034, 3813, 3669, 19337, 2666, 2615, 504...</td>\n",
759
+ " <td>0</td>\n",
760
+ " <td>first firmli believ norwegian movi continu get...</td>\n",
761
+ " <td>[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...</td>\n",
762
+ " </tr>\n",
763
+ " </tbody>\n",
764
+ "</table>\n",
765
+ "</div>"
766
+ ],
767
+ "text/plain": [
768
+ " attention_mask \\\n",
769
+ "0 [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... \n",
770
+ "1 [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... \n",
771
+ "2 [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... \n",
772
+ "3 [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... \n",
773
+ "4 [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... \n",
774
+ "\n",
775
+ " input_ids label \\\n",
776
+ "0 [101, 2921, 3198, 23624, 2954, 6978, 2674, 841... 0 \n",
777
+ "1 [101, 3422, 4372, 3775, 2099, 9587, 5737, 2071... 0 \n",
778
+ "2 [101, 3543, 2293, 2358, 10050, 2128, 25300, 11... 1 \n",
779
+ "3 [101, 3732, 2154, 11865, 15472, 2072, 8040, 73... 0 \n",
780
+ "4 [101, 2034, 3813, 3669, 19337, 2666, 2615, 504... 0 \n",
781
+ "\n",
782
+ " text \\\n",
783
+ "0 kept ask mani fight scream match swear gener m... \n",
784
+ "1 watch entir movi could watch entir movi stop d... \n",
785
+ "2 touch love stori reminisc Β‘in mood love draw h... \n",
786
+ "3 latter day fulci schlocker total abysm concoct... \n",
787
+ "4 first firmli believ norwegian movi continu get... \n",
788
+ "\n",
789
+ " token_type_ids \n",
790
+ "0 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... \n",
791
+ "1 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... \n",
792
+ "2 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... \n",
793
+ "3 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... \n",
794
+ "4 [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... "
795
+ ]
796
+ },
797
+ "execution_count": 20,
798
+ "metadata": {},
799
+ "output_type": "execute_result"
800
+ }
801
+ ],
802
+ "source": [
803
+ "df_train.head()"
804
+ ]
805
+ },
806
+ {
807
+ "cell_type": "markdown",
808
+ "metadata": {
809
+ "id": "0lTWT8JwkRic"
810
+ },
811
+ "source": [
812
+ "## Fine Tunning do Modelo\n",
813
+ "Ajuste fino do BERT para tarefas específica de classificação de sentimento para o dataset do IMDB"
814
+ ]
815
+ },
816
+ {
817
+ "cell_type": "code",
818
+ "execution_count": null,
819
+ "metadata": {},
820
+ "outputs": [],
821
+ "source": [
822
+ "import torch\n",
823
+ "import pyarrow as pa\n",
824
+ "from datasets import Dataset\n",
825
+ "import evaluate\n",
826
+ "import numpy as np"
827
+ ]
828
+ },
829
+ {
830
+ "cell_type": "code",
831
+ "execution_count": 21,
832
+ "metadata": {
833
+ "colab": {
834
+ "base_uri": "https://localhost:8080/"
835
+ },
836
+ "id": "kW53p7VQqUDD",
837
+ "outputId": "8231f3ba-37d5-4546-c4d0-6b4ff317ecf3"
838
+ },
839
+ "outputs": [
840
+ {
841
+ "data": {
842
+ "text/plain": [
843
+ "device(type='cuda', index=0)"
844
+ ]
845
+ },
846
+ "execution_count": 21,
847
+ "metadata": {},
848
+ "output_type": "execute_result"
849
+ }
850
+ ],
851
+ "source": [
852
+ "device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n",
853
+ "device"
854
+ ]
855
+ },
856
+ {
857
+ "cell_type": "code",
858
+ "execution_count": 24,
859
+ "metadata": {
860
+ "id": "68OdbTv5rLrm"
861
+ },
862
+ "outputs": [],
863
+ "source": [
864
+ "train_hg = Dataset(pa.Table.from_pandas(df_train))\n",
865
+ "valid_hg = Dataset(pa.Table.from_pandas(df_val))"
866
+ ]
867
+ },
868
+ {
869
+ "cell_type": "markdown",
870
+ "metadata": {},
871
+ "source": [
872
+ "## Metricas de avaliação F1 Score e Acc"
873
+ ]
874
+ },
875
+ {
876
+ "cell_type": "markdown",
877
+ "metadata": {},
878
+ "source": [
879
+ "`compute_metrics` calcula tanto a acurÑcia quanto o F1-score para avaliar um modelo de classificação. Primeiramente, são carregadas as métricas de acurÑcia e F1-score usando evaluate.load. Em seguida, a função compute_metrics recebe um par de arrays eval_pred, contendo as previsáes do modelo e os rótulos verdadeiros. Utilizando as previsáes, a função calcula a acurÑcia e o F1-score ponderado, onde a acurÑcia é obtida através da comparação das previsáes com os rótulos utilizando a métrica de acurÑcia previamente carregada, e o F1-score é calculado utilizando a métrica de F1 previamente carregada, com ponderação \"weighted\". Os resultados de ambas as métricas são então combinados em um dicionÑrio e retornados como um único objeto contendo as métricas de avaliação calculadas."
880
+ ]
881
+ },
882
+ {
883
+ "cell_type": "code",
884
+ "execution_count": 25,
885
+ "metadata": {
886
+ "id": "lUNhDPs0ry4m"
887
+ },
888
+ "outputs": [],
889
+ "source": [
890
+ "\n",
891
+ "# Load both accuracy and f1 metrics\n",
892
+ "accuracy_metric = evaluate.load(\"accuracy\")\n",
893
+ "f1_metric = evaluate.load(\"f1\")\n",
894
+ "\n",
895
+ "# Metric helper method\n",
896
+ "def compute_metrics(eval_pred):\n",
897
+ " predictions, labels = eval_pred\n",
898
+ " predictions = np.argmax(predictions, axis=1)\n",
899
+ "\n",
900
+ " # Compute accuracy\n",
901
+ " accuracy = accuracy_metric.compute(predictions=predictions, references=labels)\n",
902
+ "\n",
903
+ " # Compute F1 score\n",
904
+ " f1 = f1_metric.compute(predictions=predictions, references=labels, average=\"weighted\")\n",
905
+ "\n",
906
+ " # Combine the metrics into a single dictionary\n",
907
+ " combined_metrics = {\n",
908
+ " 'accuracy': accuracy['accuracy'],\n",
909
+ " 'f1': f1['f1']\n",
910
+ " }\n",
911
+ "\n",
912
+ " return combined_metrics"
913
+ ]
914
+ },
915
+ {
916
+ "cell_type": "code",
917
+ "execution_count": 26,
918
+ "metadata": {
919
+ "colab": {
920
+ "base_uri": "https://localhost:8080/"
921
+ },
922
+ "id": "9jJYTWsHjnEc",
923
+ "outputId": "fe45691a-4476-4978-89b8-15f36465c37c"
924
+ },
925
+ "outputs": [
926
+ {
927
+ "name": "stdout",
928
+ "output_type": "stream",
929
+ "text": [
930
+ "Name: accelerateNote: you may need to restart the kernel to use updated packages.\n",
931
+ "\n",
932
+ "Version: 0.31.0\n",
933
+ "Summary: Accelerate\n",
934
+ "Home-page: https://github.com/huggingface/accelerate\n",
935
+ "Author: The HuggingFace team\n",
936
+ "Author-email: zach.mueller@huggingface.co\n",
937
+ "License: Apache\n",
938
+ "Location: c:\\Users\\andre\\1JUPYTER\\dt_labs\\.venv\\Lib\\site-packages\n",
939
+ "Requires: huggingface-hub, numpy, packaging, psutil, pyyaml, safetensors, torch\n",
940
+ "Required-by: \n",
941
+ "---\n",
942
+ "Name: transformers\n",
943
+ "Version: 4.41.2\n",
944
+ "Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow\n",
945
+ "Home-page: https://github.com/huggingface/transformers\n",
946
+ "Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)\n",
947
+ "Author-email: transformers@huggingface.co\n",
948
+ "License: Apache 2.0 License\n",
949
+ "Location: c:\\Users\\andre\\1JUPYTER\\dt_labs\\.venv\\Lib\\site-packages\n",
950
+ "Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm\n",
951
+ "Required-by: \n"
952
+ ]
953
+ }
954
+ ],
955
+ "source": [
956
+ "pip show accelerate transformers"
957
+ ]
958
+ },
959
+ {
960
+ "cell_type": "markdown",
961
+ "metadata": {},
962
+ "source": [
963
+ "## Treinamento do modelo"
964
+ ]
965
+ },
966
+ {
967
+ "cell_type": "code",
968
+ "execution_count": 27,
969
+ "metadata": {
970
+ "colab": {
971
+ "base_uri": "https://localhost:8080/"
972
+ },
973
+ "id": "QlaLCwf7rLtp",
974
+ "outputId": "7e10e82a-8bc7-478b-851e-c7b628b46c41"
975
+ },
976
+ "outputs": [
977
+ {
978
+ "name": "stderr",
979
+ "output_type": "stream",
980
+ "text": [
981
+ "c:\\Users\\andre\\1JUPYTER\\dt_labs\\.venv\\Lib\\site-packages\\transformers\\training_args.py:1474: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of πŸ€— Transformers. Use `eval_strategy` instead\n",
982
+ " warnings.warn(\n"
983
+ ]
984
+ }
985
+ ],
986
+ "source": [
987
+ "from transformers import TrainingArguments, Trainer\n",
988
+ "\n",
989
+ "EPOCHS = 1\n",
990
+ "\n",
991
+ "training_args = TrainingArguments(output_dir=\"./result\",\n",
992
+ " evaluation_strategy=\"epoch\",\n",
993
+ " num_train_epochs= EPOCHS,\n",
994
+ " per_device_train_batch_size=16,\n",
995
+ " per_device_eval_batch_size=8\n",
996
+ " )\n",
997
+ "\n",
998
+ "trainer = Trainer(\n",
999
+ " model=model,\n",
1000
+ " args=training_args,\n",
1001
+ " train_dataset=train_hg,\n",
1002
+ " eval_dataset=valid_hg,\n",
1003
+ " tokenizer=tokenizer,\n",
1004
+ " compute_metrics=compute_metrics\n",
1005
+ ")"
1006
+ ]
1007
+ },
1008
+ {
1009
+ "cell_type": "code",
1010
+ "execution_count": 28,
1011
+ "metadata": {},
1012
+ "outputs": [
1013
+ {
1014
+ "name": "stdout",
1015
+ "output_type": "stream",
1016
+ "text": [
1017
+ "CUDA available: True\n",
1018
+ "CUDA version: 12.1\n"
1019
+ ]
1020
+ }
1021
+ ],
1022
+ "source": [
1023
+ "print(\"CUDA available: \", torch.cuda.is_available())\n",
1024
+ "print(\"CUDA version: \", torch.version.cuda)"
1025
+ ]
1026
+ },
1027
+ {
1028
+ "cell_type": "code",
1029
+ "execution_count": 29,
1030
+ "metadata": {
1031
+ "colab": {
1032
+ "base_uri": "https://localhost:8080/",
1033
+ "height": 141
1034
+ },
1035
+ "id": "3s6lVFz_rLwO",
1036
+ "outputId": "ee64e8e9-9c8c-42a8-c355-f51410cc33df"
1037
+ },
1038
+ "outputs": [
1039
+ {
1040
+ "name": "stderr",
1041
+ "output_type": "stream",
1042
+ "text": [
1043
+ " 0%| | 0/2500 [00:00<?, ?it/s]c:\\Users\\andre\\1JUPYTER\\dt_labs\\.venv\\Lib\\site-packages\\transformers\\models\\bert\\modeling_bert.py:435: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\\aten\\src\\ATen\\native\\transformers\\cuda\\sdp_utils.cpp:263.)\n",
1044
+ " attn_output = torch.nn.functional.scaled_dot_product_attention(\n",
1045
+ " 20%|β–ˆβ–ˆ | 500/2500 [05:35<22:22, 1.49it/s]"
1046
+ ]
1047
+ },
1048
+ {
1049
+ "name": "stdout",
1050
+ "output_type": "stream",
1051
+ "text": [
1052
+ "{'loss': 0.4994, 'grad_norm': 12.613661766052246, 'learning_rate': 4e-05, 'epoch': 0.2}\n"
1053
+ ]
1054
+ },
1055
+ {
1056
+ "name": "stderr",
1057
+ "output_type": "stream",
1058
+ "text": [
1059
+ " 40%|β–ˆβ–ˆβ–ˆβ–ˆ | 1000/2500 [11:13<16:46, 1.49it/s]"
1060
+ ]
1061
+ },
1062
+ {
1063
+ "name": "stdout",
1064
+ "output_type": "stream",
1065
+ "text": [
1066
+ "{'loss': 0.3898, 'grad_norm': 4.661791801452637, 'learning_rate': 3e-05, 'epoch': 0.4}\n"
1067
+ ]
1068
+ },
1069
+ {
1070
+ "name": "stderr",
1071
+ "output_type": "stream",
1072
+ "text": [
1073
+ " 60%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 1500/2500 [16:47<11:02, 1.51it/s]"
1074
+ ]
1075
+ },
1076
+ {
1077
+ "name": "stdout",
1078
+ "output_type": "stream",
1079
+ "text": [
1080
+ "{'loss': 0.3516, 'grad_norm': 1.5203113555908203, 'learning_rate': 2e-05, 'epoch': 0.6}\n"
1081
+ ]
1082
+ },
1083
+ {
1084
+ "name": "stderr",
1085
+ "output_type": "stream",
1086
+ "text": [
1087
+ " 80%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 2000/2500 [22:25<05:33, 1.50it/s]"
1088
+ ]
1089
+ },
1090
+ {
1091
+ "name": "stdout",
1092
+ "output_type": "stream",
1093
+ "text": [
1094
+ "{'loss': 0.3121, 'grad_norm': 8.331348419189453, 'learning_rate': 1e-05, 'epoch': 0.8}\n"
1095
+ ]
1096
+ },
1097
+ {
1098
+ "name": "stderr",
1099
+ "output_type": "stream",
1100
+ "text": [
1101
+ "100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2500/2500 [28:04<00:00, 1.50it/s]"
1102
+ ]
1103
+ },
1104
+ {
1105
+ "name": "stdout",
1106
+ "output_type": "stream",
1107
+ "text": [
1108
+ "{'loss': 0.2882, 'grad_norm': 6.287994861602783, 'learning_rate': 0.0, 'epoch': 1.0}\n"
1109
+ ]
1110
+ },
1111
+ {
1112
+ "name": "stderr",
1113
+ "output_type": "stream",
1114
+ "text": [
1115
+ " \n",
1116
+ "100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2500/2500 [30:45<00:00, 1.35it/s]"
1117
+ ]
1118
+ },
1119
+ {
1120
+ "name": "stdout",
1121
+ "output_type": "stream",
1122
+ "text": [
1123
+ "{'eval_loss': 0.283893883228302, 'eval_accuracy': 0.883, 'eval_f1': 0.8829425082505502, 'eval_runtime': 159.717, 'eval_samples_per_second': 62.611, 'eval_steps_per_second': 7.826, 'epoch': 1.0}\n",
1124
+ "{'train_runtime': 1845.2907, 'train_samples_per_second': 21.677, 'train_steps_per_second': 1.355, 'train_loss': 0.3682089477539062, 'epoch': 1.0}\n"
1125
+ ]
1126
+ },
1127
+ {
1128
+ "name": "stderr",
1129
+ "output_type": "stream",
1130
+ "text": [
1131
+ "\n"
1132
+ ]
1133
+ },
1134
+ {
1135
+ "data": {
1136
+ "text/plain": [
1137
+ "TrainOutput(global_step=2500, training_loss=0.3682089477539062, metrics={'train_runtime': 1845.2907, 'train_samples_per_second': 21.677, 'train_steps_per_second': 1.355, 'total_flos': 4111110240000000.0, 'train_loss': 0.3682089477539062, 'epoch': 1.0})"
1138
+ ]
1139
+ },
1140
+ "execution_count": 29,
1141
+ "metadata": {},
1142
+ "output_type": "execute_result"
1143
+ }
1144
+ ],
1145
+ "source": [
1146
+ "trainer.train()"
1147
+ ]
1148
+ },
1149
+ {
1150
+ "cell_type": "markdown",
1151
+ "metadata": {},
1152
+ "source": [
1153
+ "## Salvando o modelo"
1154
+ ]
1155
+ },
1156
+ {
1157
+ "cell_type": "code",
1158
+ "execution_count": 38,
1159
+ "metadata": {
1160
+ "id": "8eO6WDiOBAhg"
1161
+ },
1162
+ "outputs": [],
1163
+ "source": [
1164
+ "torch.save(model.state_dict(), 'model.pth')"
1165
+ ]
1166
+ },
1167
+ {
1168
+ "cell_type": "markdown",
1169
+ "metadata": {
1170
+ "id": "FtVZztSa40b3"
1171
+ },
1172
+ "source": [
1173
+ "## Teste de prediçáes individuais"
1174
+ ]
1175
+ },
1176
+ {
1177
+ "cell_type": "code",
1178
+ "execution_count": 34,
1179
+ "metadata": {
1180
+ "id": "lOHVSyfJJ8zK"
1181
+ },
1182
+ "outputs": [],
1183
+ "source": [
1184
+ "from transformers import AutoTokenizer\n",
1185
+ "\n",
1186
+ "new_tokenizer = AutoTokenizer.from_pretrained(pre_trained_base)"
1187
+ ]
1188
+ },
1189
+ {
1190
+ "cell_type": "code",
1191
+ "execution_count": 35,
1192
+ "metadata": {
1193
+ "id": "t-T7hDZ2J1Qk"
1194
+ },
1195
+ "outputs": [],
1196
+ "source": [
1197
+ "def get_prediction(text):\n",
1198
+ " encoding = new_tokenizer(text, return_tensors=\"pt\", padding=\"max_length\", truncation=True, max_length=MAX_LEN)\n",
1199
+ " encoding = {k: v.to(trainer.model.device) for k,v in encoding.items()}\n",
1200
+ "\n",
1201
+ " outputs = model(**encoding)\n",
1202
+ "\n",
1203
+ " logits = outputs.logits\n",
1204
+ "\n",
1205
+ " sigmoid = torch.nn.Sigmoid()\n",
1206
+ " probs = sigmoid(logits.squeeze().cpu())\n",
1207
+ " probs = probs.detach().numpy()\n",
1208
+ " label = np.argmax(probs, axis=-1)\n",
1209
+ "\n",
1210
+ " return label"
1211
+ ]
1212
+ },
1213
+ {
1214
+ "cell_type": "code",
1215
+ "execution_count": 36,
1216
+ "metadata": {
1217
+ "colab": {
1218
+ "base_uri": "https://localhost:8080/"
1219
+ },
1220
+ "id": "y4dxQ4oYJ5C1",
1221
+ "outputId": "d0d77c2d-aff6-412b-e22a-0b721f5b097e"
1222
+ },
1223
+ "outputs": [
1224
+ {
1225
+ "data": {
1226
+ "text/plain": [
1227
+ "0"
1228
+ ]
1229
+ },
1230
+ "execution_count": 36,
1231
+ "metadata": {},
1232
+ "output_type": "execute_result"
1233
+ }
1234
+ ],
1235
+ "source": [
1236
+ "get_prediction(\"This movie is horrible!\")"
1237
+ ]
1238
+ },
1239
+ {
1240
+ "cell_type": "code",
1241
+ "execution_count": 37,
1242
+ "metadata": {
1243
+ "colab": {
1244
+ "base_uri": "https://localhost:8080/"
1245
+ },
1246
+ "id": "JXAyOu_6AqoO",
1247
+ "outputId": "ffcd019e-4c0c-45eb-f538-d2860c53a0e0"
1248
+ },
1249
+ "outputs": [
1250
+ {
1251
+ "data": {
1252
+ "text/plain": [
1253
+ "1"
1254
+ ]
1255
+ },
1256
+ "execution_count": 37,
1257
+ "metadata": {},
1258
+ "output_type": "execute_result"
1259
+ }
1260
+ ],
1261
+ "source": [
1262
+ "get_prediction(\"This movie is awesome!\")"
1263
+ ]
1264
+ }
1265
+ ],
1266
+ "metadata": {
1267
+ "accelerator": "GPU",
1268
+ "colab": {
1269
+ "provenance": []
1270
+ },
1271
+ "gpuClass": "standard",
1272
+ "kernelspec": {
1273
+ "display_name": "Python 3",
1274
+ "name": "python3"
1275
+ },
1276
+ "language_info": {
1277
+ "codemirror_mode": {
1278
+ "name": "ipython",
1279
+ "version": 3
1280
+ },
1281
+ "file_extension": ".py",
1282
+ "mimetype": "text/x-python",
1283
+ "name": "python",
1284
+ "nbconvert_exporter": "python",
1285
+ "pygments_lexer": "ipython3",
1286
+ "version": "3.10.11"
1287
+ }
1288
+ },
1289
+ "nbformat": 4,
1290
+ "nbformat_minor": 0
1291
+ }
notebooks_explicativos/Simbolico.ipynb ADDED
The diff for this file is too large to render. See raw diff