{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "DRi1KwDDZz3a" }, "source": [ "# Downloading The Data\n", "The data for this project is Downloaded from kaggle(A Famous platform for Data Sience), If you want to reproduce this note book follow the steps explained in [this article](https://www.analyticsvidhya.com/blog/2021/06/how-to-load-kaggle-datasets-directly-into-google-colab/) .\n", "\n", "After downloading your kaggle credentials, upload the kaggle.json file to your google drive in a folder called kaggle." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "eTNtUJkEaJXk" }, "outputs": [], "source": [ "from google.colab import drive\n", "drive.mount('/content/gdrive')\n", "\n", "!cp '/content/gdrive/My Drive/Kaggle/kaggle.json' kaggle.json\n", "\n", "! pip install kaggle\n", "! mkdir ~/.kaggle\n", "! cp kaggle.json ~/.kaggle/\n", "! chmod 600 ~/.kaggle/kaggle.json\n", "\n", "! kaggle datasets download -d saurabhshahane/ecommerce-text-classification" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "KZ_HtjFDpUsL" }, "outputs": [], "source": [ "! unzip /content/ecommerce-text-classification.zip -d /content/data" ] }, { "cell_type": "markdown", "source": [ "# Introduction\n", "In this note book we will fine tune a text classification **Bert** model on an **Ecomerce category data**.\n", "We have 4 Categories, **Electronics**, **Household**, **Books** and **Clothing & Accessories**.\n", "\n", "### Metrics\n", "We'll use **Precision**, **Recall**, **F1-score** and **Accuracy**.\n", "\n", "### Strategy Overview\n", "The main library used in this notebook is **transormers** form **Hugging Face**, The framework is **TensorFlow** and we are fine tuning the **distilbert-base-uncased** model form **Hugging Face** which is a text classification model." ], "metadata": { "id": "cPuZWyvwhhbF" } }, { "cell_type": "markdown", "source": [ "# Packages" ], "metadata": { "id": "SHFaGM2ff-3X" } }, { "cell_type": "markdown", "source": [ "We'll install Theses packages:\n", "\n", "\n", "* **datasets** for importing the data to transformers.\n", "* **transformers** that provides a variety of NLP functionality.\n", "* **evaluate** for model evalution.\n", "* **seqeval** for the metrics used for evaluation.\n", "* **seaborn** for data visualisation.\n", "\n", "\n", "\n" ], "metadata": { "id": "LvkcQ8AmgChy" } }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "ODGTcxKabJtK" }, "outputs": [], "source": [ "! pip install datasets\n", "! pip install transformers\n", "! pip install evaluate\n", "! pip install seqeval" ] }, { "cell_type": "code", "source": [ "import tensorflow as tf\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns" ], "metadata": { "id": "4IlTLKSKg4Cx" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "# Data Preprocessing" ], "metadata": { "id": "OFFqNJbsN8Dj" } }, { "cell_type": "markdown", "source": [ "## Missing Values" ], "metadata": { "id": "3KcEvH4Re2Uh" } }, { "cell_type": "markdown", "source": [ "Our data has 2 columns, **label** and **text**." ], "metadata": { "id": "xVDTZdnCNywQ" } }, { "cell_type": "code", "source": [ "dataset_df = pd.read_csv(\"/content/data/ecommerceDataset.csv\")\n", "dataset_df = pd.DataFrame({'label': dataset_df.iloc[:,0] , 'text': dataset_df.iloc[:,1]})\n", "dataset_df.head()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 206 }, "id": "j1071yyIN6lw", "outputId": "0aef50ae-7393-4e48-fe4e-8a4bea2d7215" }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " label text\n", "0 Household SAF 'Floral' Framed Painting (Wood, 30 inch x ...\n", "1 Household SAF 'UV Textured Modern Art Print Framed' Pain...\n", "2 Household SAF Flower Print Framed Painting (Synthetic, 1...\n", "3 Household Incredible Gifts India Wooden Happy Birthday U...\n", "4 Household Pitaara Box Romantic Venice Canvas Painting 6m..." ], "text/html": [ "\n", "
\n", " | label | \n", "text | \n", "
---|---|---|
0 | \n", "Household | \n", "SAF 'Floral' Framed Painting (Wood, 30 inch x ... | \n", "
1 | \n", "Household | \n", "SAF 'UV Textured Modern Art Print Framed' Pain... | \n", "
2 | \n", "Household | \n", "SAF Flower Print Framed Painting (Synthetic, 1... | \n", "
3 | \n", "Household | \n", "Incredible Gifts India Wooden Happy Birthday U... | \n", "
4 | \n", "Household | \n", "Pitaara Box Romantic Venice Canvas Painting 6m... | \n", "