{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# SCC0633/SCC5908 - Processamento de Linguagem Natural\n", "> **Docente:** Thiago Alexandre Salgueiro Pardo \\\\\n", "> **Estagiário PAE:** Germano Antonio Zani Jorge\n", "\n", "\n", "# Integrantes do Grupo: GPTrouxas\n", "> André Guarnier De Mitri - 11395579 \\\\\n", "> Daniel Carvalho - 10685702 \\\\\n", "> Fernando - 11795342 \\\\\n", "> Lucas Henrique Sant'Anna - 10748521 \\\\\n", "> Magaly L Fujimoto - 4890582 \\\\\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Abordagem Neural usando BERT\n", "![alt text](../imagens/BERT_TDIDF.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "###" ] }, { "cell_type": "markdown", "metadata": { "id": "6yecpJR0feeQ" }, "source": [ "## Importando bibliotecas" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "id": "FAIvyZwodEtm" }, "outputs": [], "source": [ "import torch\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import math\n", "from tqdm.notebook import tqdm\n", "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "#!pip install transformers seaborn nltk" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Carregando dados" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 206 }, "id": "LYgXl3RIfgfo", "outputId": "eb496faf-7826-44f7-fa88-3b21fb6e7cbf" }, "outputs": [ { "data": { "text/html": [ "
\n", " | review | \n", "sentiment | \n", "
---|---|---|
0 | \n", "One of the other reviewers has mentioned that ... | \n", "positive | \n", "
1 | \n", "A wonderful little production. <br /><br />The... | \n", "positive | \n", "
2 | \n", "I thought this was a wonderful way to spend ti... | \n", "positive | \n", "
3 | \n", "Basically there's a family where a little boy ... | \n", "negative | \n", "
4 | \n", "Petter Mattei's \"Love in the Time of Money\" is... | \n", "positive | \n", "
\n", " | review | \n", "sentiment | \n", "
---|---|---|
0 | \n", "One of the other reviewers has mentioned that ... | \n", "1 | \n", "
1 | \n", "A wonderful little production. <br /><br />The... | \n", "1 | \n", "
2 | \n", "I thought this was a wonderful way to spend ti... | \n", "1 | \n", "
3 | \n", "Basically there's a family where a little boy ... | \n", "0 | \n", "
4 | \n", "Petter Mattei's \"Love in the Time of Money\" is... | \n", "1 | \n", "
\n", " | review | \n", "sentiment | \n", "
---|---|---|
0 | \n", "one review mention watch 1 oz episod hook righ... | \n", "1 | \n", "
1 | \n", "wonder littl product film techniqu unassum old... | \n", "1 | \n", "
2 | \n", "thought wonder way spend time hot summer weeke... | \n", "1 | \n", "
3 | \n", "basic famili littl boy jake think zombi closet... | \n", "0 | \n", "
4 | \n", "petter mattei love time money visual stun film... | \n", "1 | \n", "
\n", " | attention_mask | \n", "input_ids | \n", "label | \n", "text | \n", "token_type_ids | \n", "
---|---|---|---|---|---|
0 | \n", "[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... | \n", "[101, 2921, 3198, 23624, 2954, 6978, 2674, 841... | \n", "0 | \n", "kept ask mani fight scream match swear gener m... | \n", "[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | \n", "
1 | \n", "[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... | \n", "[101, 3422, 4372, 3775, 2099, 9587, 5737, 2071... | \n", "0 | \n", "watch entir movi could watch entir movi stop d... | \n", "[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | \n", "
2 | \n", "[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... | \n", "[101, 3543, 2293, 2358, 10050, 2128, 25300, 11... | \n", "1 | \n", "touch love stori reminisc in mood love draw h... | \n", "[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | \n", "
3 | \n", "[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... | \n", "[101, 3732, 2154, 11865, 15472, 2072, 8040, 73... | \n", "0 | \n", "latter day fulci schlocker total abysm concoct... | \n", "[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | \n", "
4 | \n", "[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... | \n", "[101, 2034, 3813, 3669, 19337, 2666, 2615, 504... | \n", "0 | \n", "first firmli believ norwegian movi continu get... | \n", "[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... | \n", "