{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "e3b3364f"
      },
      "source": [
        "# Youtube Hate Speech ML Project\n",
        "\n"
      ],
      "id": "e3b3364f"
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "12f895df"
      },
      "outputs": [],
      "source": [
        "# First thing is to import libraries.\n",
        "# I am familiar with pandas and numpy, but lets research the other libraries."
      ],
      "id": "12f895df"
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "7cfcb724"
      },
      "outputs": [],
      "source": [
        "import pandas as pd\n",
        "import numpy as np\n",
        "from sklearn.feature_extraction.text import CountVectorizer\n",
        "from sklearn.model_selection import train_test_split\n",
        "from sklearn.tree import DecisionTreeClassifier"
      ],
      "id": "7cfcb724"
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6b1fd69c"
      },
      "source": [
        "### Feature Extraction\n",
        "- `from sklearn.feature_extraction.text import CountVectorizer` coverts text documents into a matrix\n",
        "  of token counts\n",
        "- count vectorizers assign numbers to all instances of \"features\", or the words in a document.\n",
        "- `model_selection` provides tools for model selection and evaluation.\n",
        "- `train_test_split` will split our data into training and testing sets.\n",
        "- we use the DecisionTreeClassifier to train our model. We will use a decision tree to create labels and classify thise labels."
      ],
      "id": "6b1fd69c"
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ad8a3dad"
      },
      "source": [
        "- stopwords are words frequently occuring in a language and removed during text preprocessing.\n",
        "- `import pr` prints objects in a human-readable format and is used for working with nltk.\n",
        "- `from nltk.stem.snowball import SnowballStemmer` imports the stemming algorithm, which reduces words to their stem.\n",
        "- An example of the Stemming aglorithm is reducing the word 'running' to 'run'\n"
      ],
      "id": "ad8a3dad"
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "60ab1d26",
        "outputId": "9509eed2-199d-4559-988a-fd1780e6ea3c"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "[nltk_data] Downloading package stopwords to /root/nltk_data...\n",
            "[nltk_data]   Package stopwords is already up-to-date!\n"
          ]
        }
      ],
      "source": [
        "import re\n",
        "import string\n",
        "import nltk\n",
        "nltk.download('stopwords')\n",
        "from nltk.stem.snowball import SnowballStemmer"
      ],
      "id": "60ab1d26"
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "538a8bf3"
      },
      "source": [
        "### Imports"
      ],
      "id": "538a8bf3"
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "189098f2"
      },
      "source": [
        "- `import re` imports the \"regular expressions\" module, which allows us to work with strings easier.\n",
        "- `import string` imports the `string` modile, which allows for easier string formatting for text processing tasks.\n",
        "\n",
        "### Import `nltk` is SUPER important. it does the following: Tokenization: Breaking text into words or sentences.\n",
        "\n",
        "1. Part-of-Speech Tagging: Assigning grammatical categories (e.g., noun, verb, adjective) to words in a sentence.\n",
        "2. Named Entity Recognition (NER): Identifying and classifying named entities such as persons, organizations, and locations in text.\n",
        "3. Parsing: Analyzing the syntactic structure of sentences.\n",
        "4. WordNet Integration: Accessing lexical database for English, WordNet, for synonyms, antonyms, and other lexical relationships.\n",
        "5. Text Corpora: Access to various text corpora for training and testing NLP models.\n",
        "5. Text Classification And More\n",
        "\n",
        "`nltk.corpus` is a module by Natural language toolkit. Contains large sets of text for linguistic analysis and development."
      ],
      "id": "189098f2"
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "b375415f"
      },
      "source": [
        "`.words` method access words from our corpus and in this instance, we are calling the stopwords in the english set."
      ],
      "id": "b375415f"
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "bae03f72",
        "scrolled": true
      },
      "outputs": [],
      "source": [
        "stemmer = nltk.SnowballStemmer(\"english\")\n",
        "from nltk.corpus import stopwords\n",
        "import string\n",
        "stopword = set(stopwords.words(\"english\"))"
      ],
      "id": "bae03f72"
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "b13b94ce"
      },
      "source": [
        "### The dataset we will use can be found here:\n",
        "https://www.youtube.com/redirect?event=video_description&redir_token=QUFFLUhqbmp2NDVmdmoyNGpSS2hrdGNwdndvRl9WZnpqQXxBQ3Jtc0traEYtV0J4Ym5iYlJJa05tOXJyc1RSOEVzcnhOTUVxbU9YQnV2TWNZWVZ4WWZwRThCOTRUR2hFam9mbDZ5cW1Pa0VfRXhTcmhVaTBvX3pWeUN0THhwYVQycWZVbE1vcmxnakphdFF3SGxCMXhDNF9FUQ&q=https%3A%2F%2Fdrive.google.com%2Fdrive%2Ffolders%2F1uQiyJ_mDlOCcecMw7C-JYUs9bGnVJ_j8%3Fusp%3Dsharing&v=jbexvUovHxw"
      ],
      "id": "b13b94ce"
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "bf758de5"
      },
      "source": [
        "- we import our dataset\n",
        "- we dropped any `NaN` values\n",
        "- we summoned the info of our dataset."
      ],
      "id": "bf758de5"
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 206
        },
        "id": "6de55c38",
        "outputId": "5fad1bd7-e378-4a3b-9ef3-029e9e0762e6"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "   Unnamed: 0  count  hate_speech  offensive_language  neither  class  \\\n",
              "0           0      3            0                   0        3      2   \n",
              "1           1      3            0                   3        0      1   \n",
              "2           2      3            0                   3        0      1   \n",
              "3           3      3            0                   2        1      1   \n",
              "4           4      6            0                   6        0      1   \n",
              "\n",
              "                                               tweet  \n",
              "0  !!! RT @mayasolovely: As a woman you shouldn't...  \n",
              "1  !!!!! RT @mleew17: boy dats cold...tyga dwn ba...  \n",
              "2  !!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby...  \n",
              "3  !!!!!!!!! RT @C_G_Anderson: @viva_based she lo...  \n",
              "4  !!!!!!!!!!!!! RT @ShenikaRoberts: The shit you...  "
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-b943ad6d-51e3-47d4-8a8f-326b0b59f91b\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>Unnamed: 0</th>\n",
              "      <th>count</th>\n",
              "      <th>hate_speech</th>\n",
              "      <th>offensive_language</th>\n",
              "      <th>neither</th>\n",
              "      <th>class</th>\n",
              "      <th>tweet</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>0</td>\n",
              "      <td>3</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>3</td>\n",
              "      <td>2</td>\n",
              "      <td>!!! RT @mayasolovely: As a woman you shouldn't...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>1</td>\n",
              "      <td>3</td>\n",
              "      <td>0</td>\n",
              "      <td>3</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>!!!!! RT @mleew17: boy dats cold...tyga dwn ba...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>2</td>\n",
              "      <td>3</td>\n",
              "      <td>0</td>\n",
              "      <td>3</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>!!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>3</td>\n",
              "      <td>3</td>\n",
              "      <td>0</td>\n",
              "      <td>2</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>!!!!!!!!! RT @C_G_Anderson: @viva_based she lo...</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>4</td>\n",
              "      <td>6</td>\n",
              "      <td>0</td>\n",
              "      <td>6</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>!!!!!!!!!!!!! RT @ShenikaRoberts: The shit you...</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-b943ad6d-51e3-47d4-8a8f-326b0b59f91b')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-b943ad6d-51e3-47d4-8a8f-326b0b59f91b button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-b943ad6d-51e3-47d4-8a8f-326b0b59f91b');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "<div id=\"df-e2ac6c4d-fa95-48d5-a9b9-4bf1eefaef2e\">\n",
              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-e2ac6c4d-fa95-48d5-a9b9-4bf1eefaef2e')\"\n",
              "            title=\"Suggest charts\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "  </button>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "      --bg-color: #E8F0FE;\n",
              "      --fill-color: #1967D2;\n",
              "      --hover-bg-color: #E2EBFA;\n",
              "      --hover-fill-color: #174EA6;\n",
              "      --disabled-fill-color: #AAA;\n",
              "      --disabled-bg-color: #DDD;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "      --bg-color: #3B4455;\n",
              "      --fill-color: #D2E3FC;\n",
              "      --hover-bg-color: #434B5C;\n",
              "      --hover-fill-color: #FFFFFF;\n",
              "      --disabled-bg-color: #3B4455;\n",
              "      --disabled-fill-color: #666;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart {\n",
              "    background-color: var(--bg-color);\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: var(--fill-color);\n",
              "    height: 32px;\n",
              "    padding: 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: var(--hover-bg-color);\n",
              "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: var(--button-hover-fill-color);\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart-complete:disabled,\n",
              "  .colab-df-quickchart-complete:disabled:hover {\n",
              "    background-color: var(--disabled-bg-color);\n",
              "    fill: var(--disabled-fill-color);\n",
              "    box-shadow: none;\n",
              "  }\n",
              "\n",
              "  .colab-df-spinner {\n",
              "    border: 2px solid var(--fill-color);\n",
              "    border-color: transparent;\n",
              "    border-bottom-color: var(--fill-color);\n",
              "    animation:\n",
              "      spin 1s steps(1) infinite;\n",
              "  }\n",
              "\n",
              "  @keyframes spin {\n",
              "    0% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "      border-left-color: var(--fill-color);\n",
              "    }\n",
              "    20% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    30% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    40% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    60% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    80% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "    90% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "  }\n",
              "</style>\n",
              "\n",
              "  <script>\n",
              "    async function quickchart(key) {\n",
              "      const quickchartButtonEl =\n",
              "        document.querySelector('#' + key + ' button');\n",
              "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
              "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
              "      try {\n",
              "        const charts = await google.colab.kernel.invokeFunction(\n",
              "            'suggestCharts', [key], {});\n",
              "      } catch (error) {\n",
              "        console.error('Error during call to suggestCharts:', error);\n",
              "      }\n",
              "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
              "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
              "    }\n",
              "    (() => {\n",
              "      let quickchartButtonEl =\n",
              "        document.querySelector('#df-e2ac6c4d-fa95-48d5-a9b9-4bf1eefaef2e button');\n",
              "      quickchartButtonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "    })();\n",
              "  </script>\n",
              "</div>\n",
              "\n",
              "    </div>\n",
              "  </div>\n"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "dataframe",
              "variable_name": "df1",
              "summary": "{\n  \"name\": \"df1\",\n  \"rows\": 24783,\n  \"fields\": [\n    {\n      \"column\": \"Unnamed: 0\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 7299,\n        \"min\": 0,\n        \"max\": 25296,\n        \"num_unique_values\": 24783,\n        \"samples\": [\n          2326,\n          16283,\n          19362\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"count\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 0,\n        \"min\": 3,\n        \"max\": 9,\n        \"num_unique_values\": 5,\n        \"samples\": [\n          6,\n          7,\n          9\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"hate_speech\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 0,\n        \"min\": 0,\n        \"max\": 7,\n        \"num_unique_values\": 8,\n        \"samples\": [\n          1,\n          6,\n          0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"offensive_language\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 1,\n        \"min\": 0,\n        \"max\": 9,\n        \"num_unique_values\": 10,\n        \"samples\": [\n          8,\n          3,\n          7\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"neither\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 1,\n        \"min\": 0,\n        \"max\": 9,\n        \"num_unique_values\": 10,\n        \"samples\": [\n          8,\n          0,\n          4\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"class\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 0,\n        \"min\": 0,\n        \"max\": 2,\n        \"num_unique_values\": 3,\n        \"samples\": [\n          2,\n          1,\n          0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"tweet\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 24783,\n        \"samples\": [\n          \"934 8616\\ni got a missed call from yo bitch\",\n          \"RT @KINGTUNCHI_: Fucking with a bad bitch you gone need some money lil homie!\",\n          \"RT @eanahS__: @1inkkofrosess lol my credit ain't no where near good , but I know the right man for the job .. that ho nice though!\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}"
            }
          },
          "metadata": {},
          "execution_count": 38
        }
      ],
      "source": [
        "df1 = pd.read_csv(\"twitter_data.csv\")\n",
        "df1 = df1.dropna()\n",
        "df1.head()"
      ],
      "id": "6de55c38"
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "4c288f90"
      },
      "source": [
        "#### `.tolist()` converts NumPy arrays into Python lists."
      ],
      "id": "4c288f90"
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "dd22b72e",
        "outputId": "2fc7e2a7-d2ea-4e8d-ee2f-5a84eeaef1a7"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "['Unnamed: 0', 'count', 'hate_speech', 'offensive_language', 'neither', 'class', 'tweet']\n"
          ]
        }
      ],
      "source": [
        "print(df1.columns.tolist())\n"
      ],
      "id": "dd22b72e"
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "f584d46b"
      },
      "source": [
        "- The `.map()` function applies a specified function to an iterable and returns the result.\n",
        "- We used the `.map` function to assign 0, 1, and 2 to \"Hate Speech Detected\", \"Offensive language detected\", and \"No hate and -  - offensive speech\""
      ],
      "id": "f584d46b"
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Preprocess the Labels"
      ],
      "metadata": {
        "id": "MSIgr88pMz8x"
      },
      "id": "MSIgr88pMz8x"
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "117eadd5"
      },
      "outputs": [],
      "source": [
        "df1['labels'] = df1['class'].map({0:\"Hate Speech Detected\", 1:\"Offensive language detected\", 2:\"No hate and offensive speech\"})\n",
        "\n",
        "# Merging the labels\n",
        "def unify_labels(row):\n",
        "    if row['labels'] in ['Hate Speech Detected', 'Offensive language detected']:\n",
        "        return 'Offensive or Hate Speech'\n",
        "    else:\n",
        "        return 'Not Hate'\n",
        "\n",
        "# Apply this function to the dataset with three labels\n",
        "df1['labels'] = df1.apply(unify_labels, axis=1)"
      ],
      "id": "117eadd5"
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "8fdf617f",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 206
        },
        "outputId": "38e9261c-f906-476f-9179-71367b4b1c6b"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "   Unnamed: 0  count  hate_speech  offensive_language  neither  class  \\\n",
              "0           0      3            0                   0        3      2   \n",
              "1           1      3            0                   3        0      1   \n",
              "2           2      3            0                   3        0      1   \n",
              "3           3      3            0                   2        1      1   \n",
              "4           4      6            0                   6        0      1   \n",
              "\n",
              "                                               tweet                    labels  \n",
              "0  !!! RT @mayasolovely: As a woman you shouldn't...                  Not Hate  \n",
              "1  !!!!! RT @mleew17: boy dats cold...tyga dwn ba...  Offensive or Hate Speech  \n",
              "2  !!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby...  Offensive or Hate Speech  \n",
              "3  !!!!!!!!! RT @C_G_Anderson: @viva_based she lo...  Offensive or Hate Speech  \n",
              "4  !!!!!!!!!!!!! RT @ShenikaRoberts: The shit you...  Offensive or Hate Speech  "
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-effaad54-4d6a-4c97-8779-67621681a49c\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>Unnamed: 0</th>\n",
              "      <th>count</th>\n",
              "      <th>hate_speech</th>\n",
              "      <th>offensive_language</th>\n",
              "      <th>neither</th>\n",
              "      <th>class</th>\n",
              "      <th>tweet</th>\n",
              "      <th>labels</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>0</td>\n",
              "      <td>3</td>\n",
              "      <td>0</td>\n",
              "      <td>0</td>\n",
              "      <td>3</td>\n",
              "      <td>2</td>\n",
              "      <td>!!! RT @mayasolovely: As a woman you shouldn't...</td>\n",
              "      <td>Not Hate</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>1</td>\n",
              "      <td>3</td>\n",
              "      <td>0</td>\n",
              "      <td>3</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>!!!!! RT @mleew17: boy dats cold...tyga dwn ba...</td>\n",
              "      <td>Offensive or Hate Speech</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>2</td>\n",
              "      <td>3</td>\n",
              "      <td>0</td>\n",
              "      <td>3</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>!!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby...</td>\n",
              "      <td>Offensive or Hate Speech</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>3</td>\n",
              "      <td>3</td>\n",
              "      <td>0</td>\n",
              "      <td>2</td>\n",
              "      <td>1</td>\n",
              "      <td>1</td>\n",
              "      <td>!!!!!!!!! RT @C_G_Anderson: @viva_based she lo...</td>\n",
              "      <td>Offensive or Hate Speech</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>4</td>\n",
              "      <td>6</td>\n",
              "      <td>0</td>\n",
              "      <td>6</td>\n",
              "      <td>0</td>\n",
              "      <td>1</td>\n",
              "      <td>!!!!!!!!!!!!! RT @ShenikaRoberts: The shit you...</td>\n",
              "      <td>Offensive or Hate Speech</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-effaad54-4d6a-4c97-8779-67621681a49c')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-effaad54-4d6a-4c97-8779-67621681a49c button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-effaad54-4d6a-4c97-8779-67621681a49c');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "<div id=\"df-67d609d9-3a8e-4d63-a1a1-577ef1179b42\">\n",
              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-67d609d9-3a8e-4d63-a1a1-577ef1179b42')\"\n",
              "            title=\"Suggest charts\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "  </button>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "      --bg-color: #E8F0FE;\n",
              "      --fill-color: #1967D2;\n",
              "      --hover-bg-color: #E2EBFA;\n",
              "      --hover-fill-color: #174EA6;\n",
              "      --disabled-fill-color: #AAA;\n",
              "      --disabled-bg-color: #DDD;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "      --bg-color: #3B4455;\n",
              "      --fill-color: #D2E3FC;\n",
              "      --hover-bg-color: #434B5C;\n",
              "      --hover-fill-color: #FFFFFF;\n",
              "      --disabled-bg-color: #3B4455;\n",
              "      --disabled-fill-color: #666;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart {\n",
              "    background-color: var(--bg-color);\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: var(--fill-color);\n",
              "    height: 32px;\n",
              "    padding: 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: var(--hover-bg-color);\n",
              "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: var(--button-hover-fill-color);\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart-complete:disabled,\n",
              "  .colab-df-quickchart-complete:disabled:hover {\n",
              "    background-color: var(--disabled-bg-color);\n",
              "    fill: var(--disabled-fill-color);\n",
              "    box-shadow: none;\n",
              "  }\n",
              "\n",
              "  .colab-df-spinner {\n",
              "    border: 2px solid var(--fill-color);\n",
              "    border-color: transparent;\n",
              "    border-bottom-color: var(--fill-color);\n",
              "    animation:\n",
              "      spin 1s steps(1) infinite;\n",
              "  }\n",
              "\n",
              "  @keyframes spin {\n",
              "    0% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "      border-left-color: var(--fill-color);\n",
              "    }\n",
              "    20% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    30% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    40% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    60% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    80% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "    90% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "  }\n",
              "</style>\n",
              "\n",
              "  <script>\n",
              "    async function quickchart(key) {\n",
              "      const quickchartButtonEl =\n",
              "        document.querySelector('#' + key + ' button');\n",
              "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
              "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
              "      try {\n",
              "        const charts = await google.colab.kernel.invokeFunction(\n",
              "            'suggestCharts', [key], {});\n",
              "      } catch (error) {\n",
              "        console.error('Error during call to suggestCharts:', error);\n",
              "      }\n",
              "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
              "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
              "    }\n",
              "    (() => {\n",
              "      let quickchartButtonEl =\n",
              "        document.querySelector('#df-67d609d9-3a8e-4d63-a1a1-577ef1179b42 button');\n",
              "      quickchartButtonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "    })();\n",
              "  </script>\n",
              "</div>\n",
              "\n",
              "    </div>\n",
              "  </div>\n"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "dataframe",
              "variable_name": "df1",
              "summary": "{\n  \"name\": \"df1\",\n  \"rows\": 24783,\n  \"fields\": [\n    {\n      \"column\": \"Unnamed: 0\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 7299,\n        \"min\": 0,\n        \"max\": 25296,\n        \"num_unique_values\": 24783,\n        \"samples\": [\n          2326,\n          16283,\n          19362\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"count\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 0,\n        \"min\": 3,\n        \"max\": 9,\n        \"num_unique_values\": 5,\n        \"samples\": [\n          6,\n          7,\n          9\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"hate_speech\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 0,\n        \"min\": 0,\n        \"max\": 7,\n        \"num_unique_values\": 8,\n        \"samples\": [\n          1,\n          6,\n          0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"offensive_language\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 1,\n        \"min\": 0,\n        \"max\": 9,\n        \"num_unique_values\": 10,\n        \"samples\": [\n          8,\n          3,\n          7\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"neither\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 1,\n        \"min\": 0,\n        \"max\": 9,\n        \"num_unique_values\": 10,\n        \"samples\": [\n          8,\n          0,\n          4\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"class\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 0,\n        \"min\": 0,\n        \"max\": 2,\n        \"num_unique_values\": 3,\n        \"samples\": [\n          2,\n          1,\n          0\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"tweet\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 24783,\n        \"samples\": [\n          \"934 8616\\ni got a missed call from yo bitch\",\n          \"RT @KINGTUNCHI_: Fucking with a bad bitch you gone need some money lil homie!\",\n          \"RT @eanahS__: @1inkkofrosess lol my credit ain't no where near good , but I know the right man for the job .. that ho nice though!\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"labels\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          \"Offensive or Hate Speech\",\n          \"Not Hate\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}"
            }
          },
          "metadata": {},
          "execution_count": 41
        }
      ],
      "source": [
        "df1['labels'].info\n",
        "df1.head()"
      ],
      "id": "8fdf617f"
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Import the second dataset"
      ],
      "metadata": {
        "id": "9DgbrPGdSk5O"
      },
      "id": "9DgbrPGdSk5O"
    },
    {
      "cell_type": "code",
      "source": [
        "!pip install datasets\n",
        "\n",
        "from datasets import load_dataset\n",
        "\n",
        "df2 = load_dataset(\"LennardZuendorf/Dynamically-Generated-Hate-Speech-Dataset\")"
      ],
      "metadata": {
        "id": "2VaMzXeZQQIz",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "431ad557-0e4d-42c9-ecfe-91ffa3598e17"
      },
      "id": "2VaMzXeZQQIz",
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Requirement already satisfied: datasets in /usr/local/lib/python3.10/dist-packages (2.18.0)\n",
            "Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from datasets) (3.13.1)\n",
            "Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from datasets) (1.25.2)\n",
            "Requirement already satisfied: pyarrow>=12.0.0 in /usr/local/lib/python3.10/dist-packages (from datasets) (14.0.2)\n",
            "Requirement already satisfied: pyarrow-hotfix in /usr/local/lib/python3.10/dist-packages (from datasets) (0.6)\n",
            "Requirement already satisfied: dill<0.3.9,>=0.3.0 in /usr/local/lib/python3.10/dist-packages (from datasets) (0.3.8)\n",
            "Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from datasets) (1.5.3)\n",
            "Requirement already satisfied: requests>=2.19.0 in /usr/local/lib/python3.10/dist-packages (from datasets) (2.31.0)\n",
            "Requirement already satisfied: tqdm>=4.62.1 in /usr/local/lib/python3.10/dist-packages (from datasets) (4.66.2)\n",
            "Requirement already satisfied: xxhash in /usr/local/lib/python3.10/dist-packages (from datasets) (3.4.1)\n",
            "Requirement already satisfied: multiprocess in /usr/local/lib/python3.10/dist-packages (from datasets) (0.70.16)\n",
            "Requirement already satisfied: fsspec[http]<=2024.2.0,>=2023.1.0 in /usr/local/lib/python3.10/dist-packages (from datasets) (2023.6.0)\n",
            "Requirement already satisfied: aiohttp in /usr/local/lib/python3.10/dist-packages (from datasets) (3.9.3)\n",
            "Requirement already satisfied: huggingface-hub>=0.19.4 in /usr/local/lib/python3.10/dist-packages (from datasets) (0.20.3)\n",
            "Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from datasets) (24.0)\n",
            "Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from datasets) (6.0.1)\n",
            "Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets) (1.3.1)\n",
            "Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets) (23.2.0)\n",
            "Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets) (1.4.1)\n",
            "Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets) (6.0.5)\n",
            "Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets) (1.9.4)\n",
            "Requirement already satisfied: async-timeout<5.0,>=4.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets) (4.0.3)\n",
            "Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.19.4->datasets) (4.10.0)\n",
            "Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->datasets) (3.3.2)\n",
            "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->datasets) (3.6)\n",
            "Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->datasets) (2.0.7)\n",
            "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->datasets) (2024.2.2)\n",
            "Requirement already satisfied: python-dateutil>=2.8.1 in /usr/local/lib/python3.10/dist-packages (from pandas->datasets) (2.8.2)\n",
            "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas->datasets) (2023.4)\n",
            "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.1->pandas->datasets) (1.16.0)\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "# Convert it to a DataFrame\n",
        "df2 = df2['train'].to_pandas() # or dataset['split'].to_pandas()\n",
        "\n",
        "# Rename columns\n",
        "df2.rename(columns={'text': 'tweet', 'label': 'labels'}, inplace=True)\n",
        "\n",
        "# Replace label values\n",
        "label_mapping = {'hate': 'Offensive or Hate Speech', 'nothate': 'Not Hate'}\n",
        "df2['labels'] = df2['labels'].replace(label_mapping)"
      ],
      "metadata": {
        "id": "ImSWI82fQ4yb"
      },
      "id": "ImSWI82fQ4yb",
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "df2['labels'].info\n",
        "df2.head()"
      ],
      "metadata": {
        "id": "GZDrcM1ESa5e",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 310
        },
        "outputId": "1a9516b2-72d9-4a57-fbcd-ad2fdf189dfe"
      },
      "id": "GZDrcM1ESa5e",
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "   Unnamed: 0    acl.id  X1  \\\n",
              "0           1      acl1   1   \n",
              "1           2     acl10   2   \n",
              "2           3    acl100   3   \n",
              "3           4   acl1000   4   \n",
              "4           5  acl10000   5   \n",
              "\n",
              "                                               tweet  \\\n",
              "0                                dalits are lowlives   \n",
              "1             gay people are a burden to our society   \n",
              "2                              Arabs are not welcome   \n",
              "3  I'm not saying we should actually eliminate he...   \n",
              "4                       bananas are for black people   \n",
              "\n",
              "                     labels      type    target     level  split  round.base  \\\n",
              "0  Offensive or Hate Speech  notgiven  notgiven  original  train           1   \n",
              "1  Offensive or Hate Speech  notgiven  notgiven  original  train           1   \n",
              "2  Offensive or Hate Speech  notgiven  notgiven  original  train           1   \n",
              "3  Offensive or Hate Speech  notgiven  notgiven  original  train           1   \n",
              "4  Offensive or Hate Speech  notgiven  notgiven  original   test           1   \n",
              "\n",
              "    annotator round acl.id.matched  \n",
              "0  GNZuCtwed3     1           None  \n",
              "1  GNZuCtwed3     1           None  \n",
              "2  vDe7GN0NrL     1           None  \n",
              "3  oemYWm1Tjg     1           None  \n",
              "4  QiOKkCi7F8     1           None  "
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-3c09fa20-aab4-4093-a3a5-0eff8e1219b9\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>Unnamed: 0</th>\n",
              "      <th>acl.id</th>\n",
              "      <th>X1</th>\n",
              "      <th>tweet</th>\n",
              "      <th>labels</th>\n",
              "      <th>type</th>\n",
              "      <th>target</th>\n",
              "      <th>level</th>\n",
              "      <th>split</th>\n",
              "      <th>round.base</th>\n",
              "      <th>annotator</th>\n",
              "      <th>round</th>\n",
              "      <th>acl.id.matched</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>1</td>\n",
              "      <td>acl1</td>\n",
              "      <td>1</td>\n",
              "      <td>dalits are lowlives</td>\n",
              "      <td>Offensive or Hate Speech</td>\n",
              "      <td>notgiven</td>\n",
              "      <td>notgiven</td>\n",
              "      <td>original</td>\n",
              "      <td>train</td>\n",
              "      <td>1</td>\n",
              "      <td>GNZuCtwed3</td>\n",
              "      <td>1</td>\n",
              "      <td>None</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>2</td>\n",
              "      <td>acl10</td>\n",
              "      <td>2</td>\n",
              "      <td>gay people are a burden to our society</td>\n",
              "      <td>Offensive or Hate Speech</td>\n",
              "      <td>notgiven</td>\n",
              "      <td>notgiven</td>\n",
              "      <td>original</td>\n",
              "      <td>train</td>\n",
              "      <td>1</td>\n",
              "      <td>GNZuCtwed3</td>\n",
              "      <td>1</td>\n",
              "      <td>None</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>3</td>\n",
              "      <td>acl100</td>\n",
              "      <td>3</td>\n",
              "      <td>Arabs are not welcome</td>\n",
              "      <td>Offensive or Hate Speech</td>\n",
              "      <td>notgiven</td>\n",
              "      <td>notgiven</td>\n",
              "      <td>original</td>\n",
              "      <td>train</td>\n",
              "      <td>1</td>\n",
              "      <td>vDe7GN0NrL</td>\n",
              "      <td>1</td>\n",
              "      <td>None</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>4</td>\n",
              "      <td>acl1000</td>\n",
              "      <td>4</td>\n",
              "      <td>I'm not saying we should actually eliminate he...</td>\n",
              "      <td>Offensive or Hate Speech</td>\n",
              "      <td>notgiven</td>\n",
              "      <td>notgiven</td>\n",
              "      <td>original</td>\n",
              "      <td>train</td>\n",
              "      <td>1</td>\n",
              "      <td>oemYWm1Tjg</td>\n",
              "      <td>1</td>\n",
              "      <td>None</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>5</td>\n",
              "      <td>acl10000</td>\n",
              "      <td>5</td>\n",
              "      <td>bananas are for black people</td>\n",
              "      <td>Offensive or Hate Speech</td>\n",
              "      <td>notgiven</td>\n",
              "      <td>notgiven</td>\n",
              "      <td>original</td>\n",
              "      <td>test</td>\n",
              "      <td>1</td>\n",
              "      <td>QiOKkCi7F8</td>\n",
              "      <td>1</td>\n",
              "      <td>None</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-3c09fa20-aab4-4093-a3a5-0eff8e1219b9')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-3c09fa20-aab4-4093-a3a5-0eff8e1219b9 button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-3c09fa20-aab4-4093-a3a5-0eff8e1219b9');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "<div id=\"df-cdd04914-4f26-4de0-b669-7188b0a45f3b\">\n",
              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-cdd04914-4f26-4de0-b669-7188b0a45f3b')\"\n",
              "            title=\"Suggest charts\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "  </button>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "      --bg-color: #E8F0FE;\n",
              "      --fill-color: #1967D2;\n",
              "      --hover-bg-color: #E2EBFA;\n",
              "      --hover-fill-color: #174EA6;\n",
              "      --disabled-fill-color: #AAA;\n",
              "      --disabled-bg-color: #DDD;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "      --bg-color: #3B4455;\n",
              "      --fill-color: #D2E3FC;\n",
              "      --hover-bg-color: #434B5C;\n",
              "      --hover-fill-color: #FFFFFF;\n",
              "      --disabled-bg-color: #3B4455;\n",
              "      --disabled-fill-color: #666;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart {\n",
              "    background-color: var(--bg-color);\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: var(--fill-color);\n",
              "    height: 32px;\n",
              "    padding: 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: var(--hover-bg-color);\n",
              "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: var(--button-hover-fill-color);\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart-complete:disabled,\n",
              "  .colab-df-quickchart-complete:disabled:hover {\n",
              "    background-color: var(--disabled-bg-color);\n",
              "    fill: var(--disabled-fill-color);\n",
              "    box-shadow: none;\n",
              "  }\n",
              "\n",
              "  .colab-df-spinner {\n",
              "    border: 2px solid var(--fill-color);\n",
              "    border-color: transparent;\n",
              "    border-bottom-color: var(--fill-color);\n",
              "    animation:\n",
              "      spin 1s steps(1) infinite;\n",
              "  }\n",
              "\n",
              "  @keyframes spin {\n",
              "    0% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "      border-left-color: var(--fill-color);\n",
              "    }\n",
              "    20% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    30% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    40% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    60% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    80% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "    90% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "  }\n",
              "</style>\n",
              "\n",
              "  <script>\n",
              "    async function quickchart(key) {\n",
              "      const quickchartButtonEl =\n",
              "        document.querySelector('#' + key + ' button');\n",
              "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
              "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
              "      try {\n",
              "        const charts = await google.colab.kernel.invokeFunction(\n",
              "            'suggestCharts', [key], {});\n",
              "      } catch (error) {\n",
              "        console.error('Error during call to suggestCharts:', error);\n",
              "      }\n",
              "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
              "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
              "    }\n",
              "    (() => {\n",
              "      let quickchartButtonEl =\n",
              "        document.querySelector('#df-cdd04914-4f26-4de0-b669-7188b0a45f3b button');\n",
              "      quickchartButtonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "    })();\n",
              "  </script>\n",
              "</div>\n",
              "\n",
              "    </div>\n",
              "  </div>\n"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "dataframe",
              "variable_name": "df2",
              "summary": "{\n  \"name\": \"df2\",\n  \"rows\": 82399,\n  \"fields\": [\n    {\n      \"column\": \"Unnamed: 0\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 12249,\n        \"min\": 1,\n        \"max\": 331000,\n        \"num_unique_values\": 41323,\n        \"samples\": [\n          20688,\n          24611,\n          24417\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"acl.id\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 41255,\n        \"samples\": [\n          \"acl32247\",\n          \"acl13850\",\n          \"acl3767\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"X1\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 11903,\n        \"min\": 1,\n        \"max\": 41255,\n        \"num_unique_values\": 41255,\n        \"samples\": [\n          24722,\n          4281,\n          30746\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"tweet\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 41134,\n        \"samples\": [\n          \"Interestingly, we have a fuckton of Qs in our community and they are one of the biggest bulliers of LGBTs! Awful!\",\n          \"My main problems are depression and family issues.  Soceity is fucked when you have people like my neighbour living next door. HELP ME.\",\n          \"The world needs fewer muslims and more white supremacists\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"labels\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          \"Not Hate\",\n          \"Offensive or Hate Speech\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"type\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 7,\n        \"samples\": [\n          \"notgiven\",\n          \"none\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"target\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 409,\n        \"samples\": [\n          \"bla.wom, asi.south, mixed.race\",\n          \"dis, african\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"level\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          \"perturbation\",\n          \"original\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"split\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 3,\n        \"samples\": [\n          \"train\",\n          \"test\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"round.base\",\n      \"properties\": {\n        \"dtype\": \"number\",\n        \"std\": 1,\n        \"min\": 1,\n        \"max\": 4,\n        \"num_unique_values\": 4,\n        \"samples\": [\n          2,\n          4\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"annotator\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 20,\n        \"samples\": [\n          \"GNZuCtwed3\",\n          \"dqrONtdjbt\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"round\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 7,\n        \"samples\": [\n          \"1\",\n          \"2b\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"acl.id.matched\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 30098,\n        \"samples\": [\n          \"acl29061\",\n          \"acl15075\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}"
            }
          },
          "metadata": {},
          "execution_count": 44
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "### Import the third dataset"
      ],
      "metadata": {
        "id": "u6MpXiwCgju8"
      },
      "id": "u6MpXiwCgju8"
    },
    {
      "cell_type": "code",
      "source": [
        "# https://huggingface.co/datasets/ucberkeley-dlab/measuring-hate-speech\n",
        "\n",
        "df3 = load_dataset('ucberkeley-dlab/measuring-hate-speech', 'default')\n",
        "\n",
        "# Convert it to a DataFrame\n",
        "df3 = df3['train'].to_pandas()\n",
        "\n",
        "def classify_or_mark_for_test(row):\n",
        "    if row['hate_speech_score'] >= -0.25:\n",
        "        return 'Offensive or Hate Speech'\n",
        "    elif row['hate_speech_score'] < -0.25:\n",
        "        return 'Not Hate'\n",
        "    else:\n",
        "        return None\n",
        "\n",
        "# Apply the modified function\n",
        "df3['labels'] = df3.apply(classify_or_mark_for_test, axis=1)\n",
        "\n",
        "# Rename columns\n",
        "df3.rename(columns={'text': 'tweet'}, inplace=True)"
      ],
      "metadata": {
        "id": "AaO7r3wpgXlt"
      },
      "id": "AaO7r3wpgXlt",
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "df3['labels'].info\n",
        "df3.head()"
      ],
      "metadata": {
        "id": "RUM4jbgWjenb",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 360
        },
        "outputId": "9aeb7bf5-3239-41ec-acbd-2039d4e75bc2"
      },
      "id": "RUM4jbgWjenb",
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "   comment_id  annotator_id  platform  sentiment  respect  insult  humiliate  \\\n",
              "0       47777         10873         3        0.0      0.0     0.0        0.0   \n",
              "1       39773          2790         2        0.0      0.0     0.0        0.0   \n",
              "2       47101          3379         3        4.0      4.0     4.0        4.0   \n",
              "3       43625          7365         3        2.0      3.0     2.0        1.0   \n",
              "4       12538           488         0        4.0      4.0     4.0        4.0   \n",
              "\n",
              "   status  dehumanize  violence  ...  annotator_religion_jewish  \\\n",
              "0     2.0         0.0       0.0  ...                      False   \n",
              "1     2.0         0.0       0.0  ...                      False   \n",
              "2     4.0         4.0       0.0  ...                      False   \n",
              "3     2.0         0.0       0.0  ...                      False   \n",
              "4     4.0         4.0       4.0  ...                      False   \n",
              "\n",
              "   annotator_religion_mormon  annotator_religion_muslim  \\\n",
              "0                      False                      False   \n",
              "1                      False                      False   \n",
              "2                      False                      False   \n",
              "3                      False                      False   \n",
              "4                      False                      False   \n",
              "\n",
              "   annotator_religion_nothing annotator_religion_other  \\\n",
              "0                       False                    False   \n",
              "1                       False                    False   \n",
              "2                        True                    False   \n",
              "3                       False                    False   \n",
              "4                       False                    False   \n",
              "\n",
              "   annotator_sexuality_bisexual  annotator_sexuality_gay  \\\n",
              "0                         False                    False   \n",
              "1                         False                    False   \n",
              "2                         False                    False   \n",
              "3                         False                    False   \n",
              "4                         False                    False   \n",
              "\n",
              "   annotator_sexuality_straight  annotator_sexuality_other  \\\n",
              "0                          True                      False   \n",
              "1                          True                      False   \n",
              "2                          True                      False   \n",
              "3                          True                      False   \n",
              "4                          True                      False   \n",
              "\n",
              "                     labels  \n",
              "0                  Not Hate  \n",
              "1                  Not Hate  \n",
              "2  Offensive or Hate Speech  \n",
              "3  Offensive or Hate Speech  \n",
              "4  Offensive or Hate Speech  \n",
              "\n",
              "[5 rows x 132 columns]"
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-ab4be724-c06c-46c7-a2b0-7abb047ddab3\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>comment_id</th>\n",
              "      <th>annotator_id</th>\n",
              "      <th>platform</th>\n",
              "      <th>sentiment</th>\n",
              "      <th>respect</th>\n",
              "      <th>insult</th>\n",
              "      <th>humiliate</th>\n",
              "      <th>status</th>\n",
              "      <th>dehumanize</th>\n",
              "      <th>violence</th>\n",
              "      <th>...</th>\n",
              "      <th>annotator_religion_jewish</th>\n",
              "      <th>annotator_religion_mormon</th>\n",
              "      <th>annotator_religion_muslim</th>\n",
              "      <th>annotator_religion_nothing</th>\n",
              "      <th>annotator_religion_other</th>\n",
              "      <th>annotator_sexuality_bisexual</th>\n",
              "      <th>annotator_sexuality_gay</th>\n",
              "      <th>annotator_sexuality_straight</th>\n",
              "      <th>annotator_sexuality_other</th>\n",
              "      <th>labels</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>47777</td>\n",
              "      <td>10873</td>\n",
              "      <td>3</td>\n",
              "      <td>0.0</td>\n",
              "      <td>0.0</td>\n",
              "      <td>0.0</td>\n",
              "      <td>0.0</td>\n",
              "      <td>2.0</td>\n",
              "      <td>0.0</td>\n",
              "      <td>0.0</td>\n",
              "      <td>...</td>\n",
              "      <td>False</td>\n",
              "      <td>False</td>\n",
              "      <td>False</td>\n",
              "      <td>False</td>\n",
              "      <td>False</td>\n",
              "      <td>False</td>\n",
              "      <td>False</td>\n",
              "      <td>True</td>\n",
              "      <td>False</td>\n",
              "      <td>Not Hate</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>39773</td>\n",
              "      <td>2790</td>\n",
              "      <td>2</td>\n",
              "      <td>0.0</td>\n",
              "      <td>0.0</td>\n",
              "      <td>0.0</td>\n",
              "      <td>0.0</td>\n",
              "      <td>2.0</td>\n",
              "      <td>0.0</td>\n",
              "      <td>0.0</td>\n",
              "      <td>...</td>\n",
              "      <td>False</td>\n",
              "      <td>False</td>\n",
              "      <td>False</td>\n",
              "      <td>False</td>\n",
              "      <td>False</td>\n",
              "      <td>False</td>\n",
              "      <td>False</td>\n",
              "      <td>True</td>\n",
              "      <td>False</td>\n",
              "      <td>Not Hate</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>47101</td>\n",
              "      <td>3379</td>\n",
              "      <td>3</td>\n",
              "      <td>4.0</td>\n",
              "      <td>4.0</td>\n",
              "      <td>4.0</td>\n",
              "      <td>4.0</td>\n",
              "      <td>4.0</td>\n",
              "      <td>4.0</td>\n",
              "      <td>0.0</td>\n",
              "      <td>...</td>\n",
              "      <td>False</td>\n",
              "      <td>False</td>\n",
              "      <td>False</td>\n",
              "      <td>True</td>\n",
              "      <td>False</td>\n",
              "      <td>False</td>\n",
              "      <td>False</td>\n",
              "      <td>True</td>\n",
              "      <td>False</td>\n",
              "      <td>Offensive or Hate Speech</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>43625</td>\n",
              "      <td>7365</td>\n",
              "      <td>3</td>\n",
              "      <td>2.0</td>\n",
              "      <td>3.0</td>\n",
              "      <td>2.0</td>\n",
              "      <td>1.0</td>\n",
              "      <td>2.0</td>\n",
              "      <td>0.0</td>\n",
              "      <td>0.0</td>\n",
              "      <td>...</td>\n",
              "      <td>False</td>\n",
              "      <td>False</td>\n",
              "      <td>False</td>\n",
              "      <td>False</td>\n",
              "      <td>False</td>\n",
              "      <td>False</td>\n",
              "      <td>False</td>\n",
              "      <td>True</td>\n",
              "      <td>False</td>\n",
              "      <td>Offensive or Hate Speech</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>12538</td>\n",
              "      <td>488</td>\n",
              "      <td>0</td>\n",
              "      <td>4.0</td>\n",
              "      <td>4.0</td>\n",
              "      <td>4.0</td>\n",
              "      <td>4.0</td>\n",
              "      <td>4.0</td>\n",
              "      <td>4.0</td>\n",
              "      <td>4.0</td>\n",
              "      <td>...</td>\n",
              "      <td>False</td>\n",
              "      <td>False</td>\n",
              "      <td>False</td>\n",
              "      <td>False</td>\n",
              "      <td>False</td>\n",
              "      <td>False</td>\n",
              "      <td>False</td>\n",
              "      <td>True</td>\n",
              "      <td>False</td>\n",
              "      <td>Offensive or Hate Speech</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "<p>5 rows × 132 columns</p>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-ab4be724-c06c-46c7-a2b0-7abb047ddab3')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-ab4be724-c06c-46c7-a2b0-7abb047ddab3 button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-ab4be724-c06c-46c7-a2b0-7abb047ddab3');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "<div id=\"df-cf77beb3-589d-42c3-a423-6e88f5dcd64b\">\n",
              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-cf77beb3-589d-42c3-a423-6e88f5dcd64b')\"\n",
              "            title=\"Suggest charts\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "  </button>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "      --bg-color: #E8F0FE;\n",
              "      --fill-color: #1967D2;\n",
              "      --hover-bg-color: #E2EBFA;\n",
              "      --hover-fill-color: #174EA6;\n",
              "      --disabled-fill-color: #AAA;\n",
              "      --disabled-bg-color: #DDD;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "      --bg-color: #3B4455;\n",
              "      --fill-color: #D2E3FC;\n",
              "      --hover-bg-color: #434B5C;\n",
              "      --hover-fill-color: #FFFFFF;\n",
              "      --disabled-bg-color: #3B4455;\n",
              "      --disabled-fill-color: #666;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart {\n",
              "    background-color: var(--bg-color);\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: var(--fill-color);\n",
              "    height: 32px;\n",
              "    padding: 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: var(--hover-bg-color);\n",
              "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: var(--button-hover-fill-color);\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart-complete:disabled,\n",
              "  .colab-df-quickchart-complete:disabled:hover {\n",
              "    background-color: var(--disabled-bg-color);\n",
              "    fill: var(--disabled-fill-color);\n",
              "    box-shadow: none;\n",
              "  }\n",
              "\n",
              "  .colab-df-spinner {\n",
              "    border: 2px solid var(--fill-color);\n",
              "    border-color: transparent;\n",
              "    border-bottom-color: var(--fill-color);\n",
              "    animation:\n",
              "      spin 1s steps(1) infinite;\n",
              "  }\n",
              "\n",
              "  @keyframes spin {\n",
              "    0% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "      border-left-color: var(--fill-color);\n",
              "    }\n",
              "    20% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    30% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    40% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    60% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    80% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "    90% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "  }\n",
              "</style>\n",
              "\n",
              "  <script>\n",
              "    async function quickchart(key) {\n",
              "      const quickchartButtonEl =\n",
              "        document.querySelector('#' + key + ' button');\n",
              "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
              "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
              "      try {\n",
              "        const charts = await google.colab.kernel.invokeFunction(\n",
              "            'suggestCharts', [key], {});\n",
              "      } catch (error) {\n",
              "        console.error('Error during call to suggestCharts:', error);\n",
              "      }\n",
              "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
              "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
              "    }\n",
              "    (() => {\n",
              "      let quickchartButtonEl =\n",
              "        document.querySelector('#df-cf77beb3-589d-42c3-a423-6e88f5dcd64b button');\n",
              "      quickchartButtonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "    })();\n",
              "  </script>\n",
              "</div>\n",
              "\n",
              "    </div>\n",
              "  </div>\n"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "dataframe",
              "variable_name": "df3"
            }
          },
          "metadata": {},
          "execution_count": 46
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "a420ba1c"
      },
      "source": [
        "### Formated to two tables of tweets and labels"
      ],
      "id": "a420ba1c"
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "5db5746b",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 206
        },
        "outputId": "6af1b33a-f061-4612-d16d-76d022cdf443"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "                                               tweet                    labels\n",
              "0  !!! RT @mayasolovely: As a woman you shouldn't...                  Not Hate\n",
              "1  !!!!! RT @mleew17: boy dats cold...tyga dwn ba...  Offensive or Hate Speech\n",
              "2  !!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby...  Offensive or Hate Speech\n",
              "3  !!!!!!!!! RT @C_G_Anderson: @viva_based she lo...  Offensive or Hate Speech\n",
              "4  !!!!!!!!!!!!! RT @ShenikaRoberts: The shit you...  Offensive or Hate Speech"
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-a6b0e8bb-bdec-4f40-a457-e9de25abed49\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>tweet</th>\n",
              "      <th>labels</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>!!! RT @mayasolovely: As a woman you shouldn't...</td>\n",
              "      <td>Not Hate</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>!!!!! RT @mleew17: boy dats cold...tyga dwn ba...</td>\n",
              "      <td>Offensive or Hate Speech</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>!!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby...</td>\n",
              "      <td>Offensive or Hate Speech</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>!!!!!!!!! RT @C_G_Anderson: @viva_based she lo...</td>\n",
              "      <td>Offensive or Hate Speech</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>!!!!!!!!!!!!! RT @ShenikaRoberts: The shit you...</td>\n",
              "      <td>Offensive or Hate Speech</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-a6b0e8bb-bdec-4f40-a457-e9de25abed49')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-a6b0e8bb-bdec-4f40-a457-e9de25abed49 button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-a6b0e8bb-bdec-4f40-a457-e9de25abed49');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "<div id=\"df-e498e0a8-324c-45a9-81d5-bf5f30ff4767\">\n",
              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-e498e0a8-324c-45a9-81d5-bf5f30ff4767')\"\n",
              "            title=\"Suggest charts\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "  </button>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "      --bg-color: #E8F0FE;\n",
              "      --fill-color: #1967D2;\n",
              "      --hover-bg-color: #E2EBFA;\n",
              "      --hover-fill-color: #174EA6;\n",
              "      --disabled-fill-color: #AAA;\n",
              "      --disabled-bg-color: #DDD;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "      --bg-color: #3B4455;\n",
              "      --fill-color: #D2E3FC;\n",
              "      --hover-bg-color: #434B5C;\n",
              "      --hover-fill-color: #FFFFFF;\n",
              "      --disabled-bg-color: #3B4455;\n",
              "      --disabled-fill-color: #666;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart {\n",
              "    background-color: var(--bg-color);\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: var(--fill-color);\n",
              "    height: 32px;\n",
              "    padding: 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: var(--hover-bg-color);\n",
              "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: var(--button-hover-fill-color);\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart-complete:disabled,\n",
              "  .colab-df-quickchart-complete:disabled:hover {\n",
              "    background-color: var(--disabled-bg-color);\n",
              "    fill: var(--disabled-fill-color);\n",
              "    box-shadow: none;\n",
              "  }\n",
              "\n",
              "  .colab-df-spinner {\n",
              "    border: 2px solid var(--fill-color);\n",
              "    border-color: transparent;\n",
              "    border-bottom-color: var(--fill-color);\n",
              "    animation:\n",
              "      spin 1s steps(1) infinite;\n",
              "  }\n",
              "\n",
              "  @keyframes spin {\n",
              "    0% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "      border-left-color: var(--fill-color);\n",
              "    }\n",
              "    20% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    30% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    40% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    60% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    80% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "    90% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "  }\n",
              "</style>\n",
              "\n",
              "  <script>\n",
              "    async function quickchart(key) {\n",
              "      const quickchartButtonEl =\n",
              "        document.querySelector('#' + key + ' button');\n",
              "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
              "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
              "      try {\n",
              "        const charts = await google.colab.kernel.invokeFunction(\n",
              "            'suggestCharts', [key], {});\n",
              "      } catch (error) {\n",
              "        console.error('Error during call to suggestCharts:', error);\n",
              "      }\n",
              "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
              "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
              "    }\n",
              "    (() => {\n",
              "      let quickchartButtonEl =\n",
              "        document.querySelector('#df-e498e0a8-324c-45a9-81d5-bf5f30ff4767 button');\n",
              "      quickchartButtonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "    })();\n",
              "  </script>\n",
              "</div>\n",
              "\n",
              "    </div>\n",
              "  </div>\n"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "dataframe",
              "variable_name": "df1",
              "summary": "{\n  \"name\": \"df1\",\n  \"rows\": 24783,\n  \"fields\": [\n    {\n      \"column\": \"tweet\",\n      \"properties\": {\n        \"dtype\": \"string\",\n        \"num_unique_values\": 24783,\n        \"samples\": [\n          \"934 8616\\ni got a missed call from yo bitch\",\n          \"RT @KINGTUNCHI_: Fucking with a bad bitch you gone need some money lil homie!\",\n          \"RT @eanahS__: @1inkkofrosess lol my credit ain't no where near good , but I know the right man for the job .. that ho nice though!\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"labels\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          \"Offensive or Hate Speech\",\n          \"Not Hate\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}"
            }
          },
          "metadata": {},
          "execution_count": 47
        }
      ],
      "source": [
        "df1 = df1[['tweet', 'labels']]\n",
        "df1 = df1[['tweet', 'labels']].fillna(0)\n",
        "df1.head()"
      ],
      "id": "5db5746b"
    },
    {
      "cell_type": "code",
      "source": [
        "df2 = df2[['tweet', 'labels']]\n",
        "\n",
        "# Manually set the label for rows with indexes 5 and 17 to \"Offensive or Hate Speech\"\n",
        "df2.loc[[5, 17], 'labels'] = 'Offensive or Hate Speech'\n",
        "df2.head()"
      ],
      "metadata": {
        "id": "VmXNfJ3jTGFV",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 206
        },
        "outputId": "51c4e813-2fad-44c8-fdc9-c9e1a2c43f0a"
      },
      "id": "VmXNfJ3jTGFV",
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "                                               tweet                    labels\n",
              "0                                dalits are lowlives  Offensive or Hate Speech\n",
              "1             gay people are a burden to our society  Offensive or Hate Speech\n",
              "2                              Arabs are not welcome  Offensive or Hate Speech\n",
              "3  I'm not saying we should actually eliminate he...  Offensive or Hate Speech\n",
              "4                       bananas are for black people  Offensive or Hate Speech"
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-16384460-449e-47d3-b58f-92c0978fc0a7\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>tweet</th>\n",
              "      <th>labels</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>dalits are lowlives</td>\n",
              "      <td>Offensive or Hate Speech</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>gay people are a burden to our society</td>\n",
              "      <td>Offensive or Hate Speech</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>Arabs are not welcome</td>\n",
              "      <td>Offensive or Hate Speech</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>I'm not saying we should actually eliminate he...</td>\n",
              "      <td>Offensive or Hate Speech</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>bananas are for black people</td>\n",
              "      <td>Offensive or Hate Speech</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-16384460-449e-47d3-b58f-92c0978fc0a7')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-16384460-449e-47d3-b58f-92c0978fc0a7 button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-16384460-449e-47d3-b58f-92c0978fc0a7');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "<div id=\"df-f0d3fc6a-ea7e-4b54-98b0-b72fd8b4fc34\">\n",
              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-f0d3fc6a-ea7e-4b54-98b0-b72fd8b4fc34')\"\n",
              "            title=\"Suggest charts\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "  </button>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "      --bg-color: #E8F0FE;\n",
              "      --fill-color: #1967D2;\n",
              "      --hover-bg-color: #E2EBFA;\n",
              "      --hover-fill-color: #174EA6;\n",
              "      --disabled-fill-color: #AAA;\n",
              "      --disabled-bg-color: #DDD;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "      --bg-color: #3B4455;\n",
              "      --fill-color: #D2E3FC;\n",
              "      --hover-bg-color: #434B5C;\n",
              "      --hover-fill-color: #FFFFFF;\n",
              "      --disabled-bg-color: #3B4455;\n",
              "      --disabled-fill-color: #666;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart {\n",
              "    background-color: var(--bg-color);\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: var(--fill-color);\n",
              "    height: 32px;\n",
              "    padding: 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: var(--hover-bg-color);\n",
              "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: var(--button-hover-fill-color);\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart-complete:disabled,\n",
              "  .colab-df-quickchart-complete:disabled:hover {\n",
              "    background-color: var(--disabled-bg-color);\n",
              "    fill: var(--disabled-fill-color);\n",
              "    box-shadow: none;\n",
              "  }\n",
              "\n",
              "  .colab-df-spinner {\n",
              "    border: 2px solid var(--fill-color);\n",
              "    border-color: transparent;\n",
              "    border-bottom-color: var(--fill-color);\n",
              "    animation:\n",
              "      spin 1s steps(1) infinite;\n",
              "  }\n",
              "\n",
              "  @keyframes spin {\n",
              "    0% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "      border-left-color: var(--fill-color);\n",
              "    }\n",
              "    20% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    30% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    40% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    60% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    80% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "    90% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "  }\n",
              "</style>\n",
              "\n",
              "  <script>\n",
              "    async function quickchart(key) {\n",
              "      const quickchartButtonEl =\n",
              "        document.querySelector('#' + key + ' button');\n",
              "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
              "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
              "      try {\n",
              "        const charts = await google.colab.kernel.invokeFunction(\n",
              "            'suggestCharts', [key], {});\n",
              "      } catch (error) {\n",
              "        console.error('Error during call to suggestCharts:', error);\n",
              "      }\n",
              "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
              "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
              "    }\n",
              "    (() => {\n",
              "      let quickchartButtonEl =\n",
              "        document.querySelector('#df-f0d3fc6a-ea7e-4b54-98b0-b72fd8b4fc34 button');\n",
              "      quickchartButtonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "    })();\n",
              "  </script>\n",
              "</div>\n",
              "\n",
              "    </div>\n",
              "  </div>\n"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "dataframe",
              "variable_name": "df2",
              "summary": "{\n  \"name\": \"df2\",\n  \"rows\": 82399,\n  \"fields\": [\n    {\n      \"column\": \"tweet\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 41134,\n        \"samples\": [\n          \"Interestingly, we have a fuckton of Qs in our community and they are one of the biggest bulliers of LGBTs! Awful!\",\n          \"My main problems are depression and family issues.  Soceity is fucked when you have people like my neighbour living next door. HELP ME.\",\n          \"The world needs fewer muslims and more white supremacists\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    },\n    {\n      \"column\": \"labels\",\n      \"properties\": {\n        \"dtype\": \"category\",\n        \"num_unique_values\": 2,\n        \"samples\": [\n          \"Not Hate\",\n          \"Offensive or Hate Speech\"\n        ],\n        \"semantic_type\": \"\",\n        \"description\": \"\"\n      }\n    }\n  ]\n}"
            }
          },
          "metadata": {},
          "execution_count": 48
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "df3 = df3[['tweet', 'labels']]\n",
        "df3.head()"
      ],
      "metadata": {
        "id": "tFHq-la7lW85",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 206
        },
        "outputId": "42396c47-5555-4004-e1ba-6185048835c2"
      },
      "id": "tFHq-la7lW85",
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "                                               tweet                    labels\n",
              "0  Yes indeed. She sort of reminds me of the elde...                  Not Hate\n",
              "1  The trans women reading this tweet right now i...                  Not Hate\n",
              "2  Question: These 4 broads who criticize America...  Offensive or Hate Speech\n",
              "3  It is about time for all illegals to go back t...  Offensive or Hate Speech\n",
              "4  For starters bend over the one in pink and kic...  Offensive or Hate Speech"
            ],
            "text/html": [
              "\n",
              "  <div id=\"df-27fda134-4875-4ea1-bf8c-1e2efabc33f1\" class=\"colab-df-container\">\n",
              "    <div>\n",
              "<style scoped>\n",
              "    .dataframe tbody tr th:only-of-type {\n",
              "        vertical-align: middle;\n",
              "    }\n",
              "\n",
              "    .dataframe tbody tr th {\n",
              "        vertical-align: top;\n",
              "    }\n",
              "\n",
              "    .dataframe thead th {\n",
              "        text-align: right;\n",
              "    }\n",
              "</style>\n",
              "<table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              "    <tr style=\"text-align: right;\">\n",
              "      <th></th>\n",
              "      <th>tweet</th>\n",
              "      <th>labels</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <th>0</th>\n",
              "      <td>Yes indeed. She sort of reminds me of the elde...</td>\n",
              "      <td>Not Hate</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>1</th>\n",
              "      <td>The trans women reading this tweet right now i...</td>\n",
              "      <td>Not Hate</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>2</th>\n",
              "      <td>Question: These 4 broads who criticize America...</td>\n",
              "      <td>Offensive or Hate Speech</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>3</th>\n",
              "      <td>It is about time for all illegals to go back t...</td>\n",
              "      <td>Offensive or Hate Speech</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <th>4</th>\n",
              "      <td>For starters bend over the one in pink and kic...</td>\n",
              "      <td>Offensive or Hate Speech</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table>\n",
              "</div>\n",
              "    <div class=\"colab-df-buttons\">\n",
              "\n",
              "  <div class=\"colab-df-container\">\n",
              "    <button class=\"colab-df-convert\" onclick=\"convertToInteractive('df-27fda134-4875-4ea1-bf8c-1e2efabc33f1')\"\n",
              "            title=\"Convert this dataframe to an interactive table.\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "  <svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\" viewBox=\"0 -960 960 960\">\n",
              "    <path d=\"M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z\"/>\n",
              "  </svg>\n",
              "    </button>\n",
              "\n",
              "  <style>\n",
              "    .colab-df-container {\n",
              "      display:flex;\n",
              "      gap: 12px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert {\n",
              "      background-color: #E8F0FE;\n",
              "      border: none;\n",
              "      border-radius: 50%;\n",
              "      cursor: pointer;\n",
              "      display: none;\n",
              "      fill: #1967D2;\n",
              "      height: 32px;\n",
              "      padding: 0 0 0 0;\n",
              "      width: 32px;\n",
              "    }\n",
              "\n",
              "    .colab-df-convert:hover {\n",
              "      background-color: #E2EBFA;\n",
              "      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "      fill: #174EA6;\n",
              "    }\n",
              "\n",
              "    .colab-df-buttons div {\n",
              "      margin-bottom: 4px;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert {\n",
              "      background-color: #3B4455;\n",
              "      fill: #D2E3FC;\n",
              "    }\n",
              "\n",
              "    [theme=dark] .colab-df-convert:hover {\n",
              "      background-color: #434B5C;\n",
              "      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);\n",
              "      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));\n",
              "      fill: #FFFFFF;\n",
              "    }\n",
              "  </style>\n",
              "\n",
              "    <script>\n",
              "      const buttonEl =\n",
              "        document.querySelector('#df-27fda134-4875-4ea1-bf8c-1e2efabc33f1 button.colab-df-convert');\n",
              "      buttonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "\n",
              "      async function convertToInteractive(key) {\n",
              "        const element = document.querySelector('#df-27fda134-4875-4ea1-bf8c-1e2efabc33f1');\n",
              "        const dataTable =\n",
              "          await google.colab.kernel.invokeFunction('convertToInteractive',\n",
              "                                                    [key], {});\n",
              "        if (!dataTable) return;\n",
              "\n",
              "        const docLinkHtml = 'Like what you see? Visit the ' +\n",
              "          '<a target=\"_blank\" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'\n",
              "          + ' to learn more about interactive tables.';\n",
              "        element.innerHTML = '';\n",
              "        dataTable['output_type'] = 'display_data';\n",
              "        await google.colab.output.renderOutput(dataTable, element);\n",
              "        const docLink = document.createElement('div');\n",
              "        docLink.innerHTML = docLinkHtml;\n",
              "        element.appendChild(docLink);\n",
              "      }\n",
              "    </script>\n",
              "  </div>\n",
              "\n",
              "\n",
              "<div id=\"df-fa7d1c32-0e19-43f4-9f2e-672bc25106ce\">\n",
              "  <button class=\"colab-df-quickchart\" onclick=\"quickchart('df-fa7d1c32-0e19-43f4-9f2e-672bc25106ce')\"\n",
              "            title=\"Suggest charts\"\n",
              "            style=\"display:none;\">\n",
              "\n",
              "<svg xmlns=\"http://www.w3.org/2000/svg\" height=\"24px\"viewBox=\"0 0 24 24\"\n",
              "     width=\"24px\">\n",
              "    <g>\n",
              "        <path d=\"M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z\"/>\n",
              "    </g>\n",
              "</svg>\n",
              "  </button>\n",
              "\n",
              "<style>\n",
              "  .colab-df-quickchart {\n",
              "      --bg-color: #E8F0FE;\n",
              "      --fill-color: #1967D2;\n",
              "      --hover-bg-color: #E2EBFA;\n",
              "      --hover-fill-color: #174EA6;\n",
              "      --disabled-fill-color: #AAA;\n",
              "      --disabled-bg-color: #DDD;\n",
              "  }\n",
              "\n",
              "  [theme=dark] .colab-df-quickchart {\n",
              "      --bg-color: #3B4455;\n",
              "      --fill-color: #D2E3FC;\n",
              "      --hover-bg-color: #434B5C;\n",
              "      --hover-fill-color: #FFFFFF;\n",
              "      --disabled-bg-color: #3B4455;\n",
              "      --disabled-fill-color: #666;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart {\n",
              "    background-color: var(--bg-color);\n",
              "    border: none;\n",
              "    border-radius: 50%;\n",
              "    cursor: pointer;\n",
              "    display: none;\n",
              "    fill: var(--fill-color);\n",
              "    height: 32px;\n",
              "    padding: 0;\n",
              "    width: 32px;\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart:hover {\n",
              "    background-color: var(--hover-bg-color);\n",
              "    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);\n",
              "    fill: var(--button-hover-fill-color);\n",
              "  }\n",
              "\n",
              "  .colab-df-quickchart-complete:disabled,\n",
              "  .colab-df-quickchart-complete:disabled:hover {\n",
              "    background-color: var(--disabled-bg-color);\n",
              "    fill: var(--disabled-fill-color);\n",
              "    box-shadow: none;\n",
              "  }\n",
              "\n",
              "  .colab-df-spinner {\n",
              "    border: 2px solid var(--fill-color);\n",
              "    border-color: transparent;\n",
              "    border-bottom-color: var(--fill-color);\n",
              "    animation:\n",
              "      spin 1s steps(1) infinite;\n",
              "  }\n",
              "\n",
              "  @keyframes spin {\n",
              "    0% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "      border-left-color: var(--fill-color);\n",
              "    }\n",
              "    20% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    30% {\n",
              "      border-color: transparent;\n",
              "      border-left-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    40% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-top-color: var(--fill-color);\n",
              "    }\n",
              "    60% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "    }\n",
              "    80% {\n",
              "      border-color: transparent;\n",
              "      border-right-color: var(--fill-color);\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "    90% {\n",
              "      border-color: transparent;\n",
              "      border-bottom-color: var(--fill-color);\n",
              "    }\n",
              "  }\n",
              "</style>\n",
              "\n",
              "  <script>\n",
              "    async function quickchart(key) {\n",
              "      const quickchartButtonEl =\n",
              "        document.querySelector('#' + key + ' button');\n",
              "      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.\n",
              "      quickchartButtonEl.classList.add('colab-df-spinner');\n",
              "      try {\n",
              "        const charts = await google.colab.kernel.invokeFunction(\n",
              "            'suggestCharts', [key], {});\n",
              "      } catch (error) {\n",
              "        console.error('Error during call to suggestCharts:', error);\n",
              "      }\n",
              "      quickchartButtonEl.classList.remove('colab-df-spinner');\n",
              "      quickchartButtonEl.classList.add('colab-df-quickchart-complete');\n",
              "    }\n",
              "    (() => {\n",
              "      let quickchartButtonEl =\n",
              "        document.querySelector('#df-fa7d1c32-0e19-43f4-9f2e-672bc25106ce button');\n",
              "      quickchartButtonEl.style.display =\n",
              "        google.colab.kernel.accessAllowed ? 'block' : 'none';\n",
              "    })();\n",
              "  </script>\n",
              "</div>\n",
              "\n",
              "    </div>\n",
              "  </div>\n"
            ],
            "application/vnd.google.colaboratory.intrinsic+json": {
              "type": "dataframe",
              "variable_name": "df3"
            }
          },
          "metadata": {},
          "execution_count": 49
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0e4048ff"
      },
      "source": [
        "### Now begins the process of cleaning the text"
      ],
      "id": "0e4048ff"
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "236e992d"
      },
      "source": [
        "- `clean(text)` invokes a function to perform text cleaning\n",
        "- `text = re.sub('\\[.*?\\]', '', text)` uses the `re` module to replace all characters `('\\[.*?\\]') with whitespace."
      ],
      "id": "236e992d"
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "48f9812a"
      },
      "source": [
        "#### `re.sub` works like this: import `re`\n",
        "\n",
        "text = \"Hello, world! This is a test string.\"\n",
        "\n",
        "#### Replace all occurrences of 'world' with 'planet'\n",
        "\n",
        "new_text = re.sub(r'world', 'planet', text)\n",
        "print(new_text)`"
      ],
      "id": "48f9812a"
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "0a4a9cdf"
      },
      "source": [
        "- we use the same `re.sub` method to replace `'https?://\\S+|www\\.\\S+'` and `'<.*?>+'` and fill with `''`, or whitespace.\n",
        "- breaking down the `('[%s]' % re.escape(string.punctuation)` code, `string.punctuation` contains all of the punctuation characters. `re.escape` will escape any punctuation string and treat them as literals."
      ],
      "id": "0a4a9cdf"
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "4f3f70a2"
      },
      "source": [
        "- we put the `%s` in brackets `[]` because it is a charachter class, so the function will match any charachter contained in the  given charachter class.\n",
        "- `%s` concattenates strings together.\n",
        "- `text = re.sub('\\n', '', text)` removes all of the newline characters from the text and replaces it with whitespace."
      ],
      "id": "4f3f70a2"
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5961fea1"
      },
      "source": [
        "### we will break down `text = re.sub('\\w*\\d\\w*', \"\", text)`\n",
        "- ` \\w` Matches any alphanumeric character; this is equivalent to the class [a-zA-Z0-9_].\n",
        "- ` \\d` matches decimal digits 0-9\n",
        "- ` *` matches zero or more occurances of the preciding word character."
      ],
      "id": "5961fea1"
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "d8fb9332"
      },
      "source": [
        "### `text = [word for word in text.split() if word not in stopword]` operates as such:\n",
        "\n",
        "1. `text.split()` splits input text into a list of words\n",
        "2. the function `[word for word in text.split() if word not in stopword]` creates a new list containing our words that are not stopwords.\n",
        "3. we then return the cleaned text with the  `return text` call,\n",
        "4. We then apply our function to the \"tweet\" variable.\n",
        "\n",
        "This was not a part of the tutorial, but I applied `df[\"tweet\"] = df[\"tweet\"].dropna()` to eliminate NaN values from the dataset.\n"
      ],
      "id": "d8fb9332"
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "604ec08e"
      },
      "outputs": [],
      "source": [
        "def clean(text):\n",
        "    text = str(text).lower()\n",
        "    text = re.sub('\\[.*?\\]', '', text)\n",
        "    text = re.sub('https?://\\S+|www\\.\\S+', '', text)\n",
        "    text = re.sub('<.*?>+', '', text)\n",
        "    text = re.sub('[%s]' % re.escape(string.punctuation), '', text)\n",
        "    text = re.sub('\\n', '', text)\n",
        "    text = re.sub('\\w*\\d\\w*', \"\", text)\n",
        "    text = [word for word in text.split() if word not in stopword]\n",
        "    text = \" \".join(text)\n",
        "    return text\n",
        "# Apply cleaning function to the 'tweet' column of both dataframes\n",
        "df1['tweet'] = df1['tweet'].apply(clean)"
      ],
      "id": "604ec08e"
    },
    {
      "cell_type": "code",
      "source": [
        "# Print the head of df1\n",
        "print(\"df1 head:\")\n",
        "print(df1.head())\n",
        "\n",
        "# Add a separator for better readability\n",
        "print(\"\\n\" + \"-\"*78 + \"\\n\")\n",
        "\n",
        "# Print the head of df2\n",
        "print(\"df2 head:\")\n",
        "print(df2.head())\n",
        "\n",
        "# Add a separator for better readability\n",
        "print(\"\\n\" + \"-\"*78 + \"\\n\")\n",
        "\n",
        "# Print the head of df3\n",
        "print(\"df3 head:\")\n",
        "print(df3.head())"
      ],
      "metadata": {
        "id": "mmh9Nc_GWzA3",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "9d467693-3c2c-4957-f605-9dd1e45e13a2"
      },
      "id": "mmh9Nc_GWzA3",
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "df1 head:\n",
            "                                               tweet                    labels\n",
            "0  rt mayasolovely woman shouldnt complain cleani...                  Not Hate\n",
            "1  rt boy dats coldtyga dwn bad cuffin dat hoe place  Offensive or Hate Speech\n",
            "2  rt urkindofbrand dawg rt ever fuck bitch start...  Offensive or Hate Speech\n",
            "3           rt cganderson vivabased look like tranny  Offensive or Hate Speech\n",
            "4  rt shenikaroberts shit hear might true might f...  Offensive or Hate Speech\n",
            "\n",
            "------------------------------------------------------------------------------\n",
            "\n",
            "df2 head:\n",
            "                                               tweet                    labels\n",
            "0                                dalits are lowlives  Offensive or Hate Speech\n",
            "1             gay people are a burden to our society  Offensive or Hate Speech\n",
            "2                              Arabs are not welcome  Offensive or Hate Speech\n",
            "3  I'm not saying we should actually eliminate he...  Offensive or Hate Speech\n",
            "4                       bananas are for black people  Offensive or Hate Speech\n",
            "\n",
            "------------------------------------------------------------------------------\n",
            "\n",
            "df3 head:\n",
            "                                               tweet                    labels\n",
            "0  Yes indeed. She sort of reminds me of the elde...                  Not Hate\n",
            "1  The trans women reading this tweet right now i...                  Not Hate\n",
            "2  Question: These 4 broads who criticize America...  Offensive or Hate Speech\n",
            "3  It is about time for all illegals to go back t...  Offensive or Hate Speech\n",
            "4  For starters bend over the one in pink and kic...  Offensive or Hate Speech\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7393229d"
      },
      "source": [
        "### Our data is ready. Now to build the classification model."
      ],
      "id": "7393229d"
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7e467fcc"
      },
      "source": [
        "- we start by creating NumPy arrays of our dataset and labels. with `np.array(df1[])`"
      ],
      "id": "7e467fcc"
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "46b4e571"
      },
      "source": [
        "- `CountVectorizer()` comverts our text data into a matrix of token accounts.\n",
        "\n",
        "- `cv.fit_transform` applies the `fit_transform` method of the CountVectorizeer with x.\n",
        "\n",
        "- fit_transform analyzes our data to find characteristcs and transforms the data based on the learned parameters."
      ],
      "id": "46b4e571"
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "abf75e09"
      },
      "source": [
        "- Next, we create the variables \"X_train, X_test, y_train, y_test\" and assign them to `train_test_split`\n",
        "- We input our x and y values into `train_test_split` with a `test_size` of .33 and a `random_state` of 42\n",
        "- the `test_size` value .33 means that 33 % of the data will be used for testing, while 67 % will be used for training.\n",
        "- the `random_state` function randomly shuffles our train and test values to mitigate bias.\n",
        "- assigning `random_state` to 42 will produce the same results after multiple executions."
      ],
      "id": "abf75e09"
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "cb9565bd"
      },
      "source": [
        "### Decision Tree Classifier\n",
        "\n",
        "- Decision tree chooses best feature (the root)\n",
        "- Next, it creates a split based on a feature\n",
        "- Then, the decision tree repeats the above steps until the leaf node is pure, meaning there are no more decisions to make.\n",
        "- We chose a Decision Tree because our data and what we want it to do (predict hate speech)."
      ],
      "id": "cb9565bd"
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "e156f5cc"
      },
      "source": [
        "![image.png](attachment:image.png)"
      ],
      "id": "e156f5cc"
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "22e3bf28"
      },
      "source": [
        "#### Finally, we fit our model.\n",
        "\n",
        "- The `.fit` method takes the input features (X) and corresponding target labels (y) as arguments and uses them to train the model. In other words, fitting the model == training the model.\n",
        "\n",
        "- the line of code `clf.fit(X_train,y_train)` is equivalent to `DecisionTreeClassifier().fit(X_train,y_train)`.\n",
        "\n",
        "- We now have a trained classification model!\n"
      ],
      "id": "22e3bf28"
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "d683c415",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 75
        },
        "outputId": "8393400f-fa3f-4dd6-947c-0e038efeab37"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "DecisionTreeClassifier()"
            ],
            "text/html": [
              "<style>#sk-container-id-2 {color: black;background-color: white;}#sk-container-id-2 pre{padding: 0;}#sk-container-id-2 div.sk-toggleable {background-color: white;}#sk-container-id-2 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-2 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-2 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-2 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-2 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-2 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-2 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-2 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-2 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-2 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-2 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-2 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-2 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-2 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-2 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-2 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-2 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-2 div.sk-item {position: relative;z-index: 1;}#sk-container-id-2 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-2 div.sk-item::before, #sk-container-id-2 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-2 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-2 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-2 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-2 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-2 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-2 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-2 div.sk-label-container {text-align: center;}#sk-container-id-2 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-2 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-container-id-2\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>DecisionTreeClassifier()</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-2\" type=\"checkbox\" checked><label for=\"sk-estimator-id-2\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">DecisionTreeClassifier</label><div class=\"sk-toggleable__content\"><pre>DecisionTreeClassifier()</pre></div></div></div></div></div>"
            ]
          },
          "metadata": {},
          "execution_count": 52
        }
      ],
      "source": [
        "combined_df = pd.concat([df1, df2, df3], ignore_index=True)\n",
        "\n",
        "x = np.array(combined_df[\"tweet\"])\n",
        "y = np.array(combined_df[\"labels\"])\n",
        "\n",
        "cv = CountVectorizer()\n",
        "x = cv.fit_transform(x)\n",
        "\n",
        "X_train, X_test, y_train, y_test = train_test_split(x,y, test_size = 0.25, random_state = 42)\n",
        "clf = DecisionTreeClassifier()\n",
        "clf.fit(X_train,y_train)\n",
        "\n"
      ],
      "id": "d683c415"
    },
    {
      "cell_type": "code",
      "source": [
        "from sklearn.metrics import accuracy_score, classification_report\n",
        "\n",
        "y_pred = clf.predict(X_test)\n",
        "print(f\"Accuracy: {accuracy_score(y_test, y_pred)}\")\n",
        "print(classification_report(y_test, y_pred))"
      ],
      "metadata": {
        "id": "sdFRCXtGY3yI",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "97939191-203a-4c8b-f0da-ac30fc3945ee"
      },
      "id": "sdFRCXtGY3yI",
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Accuracy: 0.9212820301557222\n",
            "                          precision    recall  f1-score   support\n",
            "\n",
            "                Not Hate       0.91      0.92      0.91     27996\n",
            "Offensive or Hate Speech       0.93      0.92      0.93     32689\n",
            "\n",
            "                accuracy                           0.92     60685\n",
            "               macro avg       0.92      0.92      0.92     60685\n",
            "            weighted avg       0.92      0.92      0.92     60685\n",
            "\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "b25a0cf6"
      },
      "source": [
        "### Testing the model\n",
        "- test_data will be the words that are input to test if the words are offensive or not."
      ],
      "id": "b25a0cf6"
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "9902038d"
      },
      "source": [
        "#### Let's break down the code `df = cv.transform([test_data]).toarray()` :\n"
      ],
      "id": "9902038d"
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "45c25d08"
      },
      "source": [
        "- `cv.transform`  converts text data into a numerical representation suitable for machine learning algorithms to process.\n",
        "- We are transforming the `test_data` into a numerical vector representation using the `transform()` method.\n",
        "- Next, we use `.toarray()`  to convert the previously transformed test data into a Numpy Array. This is due to sklearn requires dense arrays as input."
      ],
      "id": "45c25d08"
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "14394f22"
      },
      "source": [
        "#### Lastly, we will make predictions using the trained model.\n",
        "\n",
        "- The  `predict` method uses the data from `df` to return predicted values in an array containing the different values or labels for each data point.\n",
        "- In the case of our Hate speech ML Model, we use the `predict()` method to predict weather the text entered falls within our labels \"Hate Speech Detected\", \"Offensive language detected\", or \"No hate and offensive speech\".\n",
        "- `clf.predict(df)` means that the `decsion_tree_classifier` uses the `predict` method to take the features from our `df` array and make predictions based on learned patterns and relationships captured by the model during training."
      ],
      "id": "14394f22"
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7cb51e7d"
      },
      "source": [
        "### Below is the code that the video wanted me to use to test this model. I will write a function later  so we do not have to re-type the test data  string every time.\n",
        "- If you would like to test out the model, edit the words in the \"I will kill you\" string.\n",
        "- This model is not perfect; test it out and document what works and what does not."
      ],
      "id": "7cb51e7d"
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "4c8d6a3b",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "f4c6a6e2-19dd-43ed-ec4d-0a2c5bcc4440"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "['Offensive or Hate Speech']\n"
          ]
        }
      ],
      "source": [
        "test_data = \"Arabs welcome\"\n",
        "combined_df = cv.transform([test_data]).toarray()\n",
        "print(clf.predict(combined_df))"
      ],
      "id": "4c8d6a3b"
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "2d0fbb0d"
      },
      "source": [
        "# Conclusion of  Hate Speech Detection project"
      ],
      "id": "2d0fbb0d"
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "7891f059"
      },
      "source": [
        "- I learned what many modules in sklearn do, such as `train_test_split()`,`fit()` and  `CountVectorizer`.\n",
        "- I learned about the nltk library, which is quite useful for NLP and text processing tasks.\n",
        "- I learned about decision trees and stopwords.\n",
        "- I learned `.tolist()` converts NumPy arrays into Python lists.\n",
        "- I learned How to use the `map()` function\n",
        "- I learned much about the `re.sub` module for text cleaning\n",
        "- I learned why we use the values .33 and 42 as values in our decision tree model.\n",
        "- I learned how to train a test set. For this lab, it was done by:\n",
        "`X_train, X_test, y_train, y_test = train_test_split(x,y, test_size = 0.33, random_state = 42)`\n",
        "- I learned that the data needs to be transformed into a NumPy array with `.toarray()` before we can make predictions.\n",
        "- I learned  that the `.predict` function predicts values based on the trained classifier. In this case, it was `clf`.\n",
        "\n",
        "\n",
        "\n"
      ],
      "id": "7891f059"
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "fb36a279"
      },
      "outputs": [],
      "source": [],
      "id": "fb36a279"
    }
  ],
  "metadata": {
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3 (ipykernel)",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.11.5"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 5
}