Spaces:

Mahesh9
/

CFPB-Complaint-Classifier

Sleeping

App Files Files Community

Mahesh Babu commited on May 3

Commit

26c2106

•

1 Parent(s): 7467c1e

added modeling files

Browse files

Files changed (31) hide show

subproduct_prediction/.DS_Store +0 -0
subproduct_prediction/.ipynb_checkpoints/Pipeline Modified-checkpoint.ipynb +421 -0
subproduct_prediction/.ipynb_checkpoints/Pipeline-checkpoint.ipynb +192 -0
subproduct_prediction/.ipynb_checkpoints/Sub_Issue-checkpoint.ipynb +1900 -0
subproduct_prediction/.ipynb_checkpoints/Sub_Issues-modified-checkpoint.ipynb +990 -0
subproduct_prediction/Pipeline.ipynb +0 -0
subproduct_prediction/Sub_Issue.ipynb +990 -0
subproduct_prediction/Sub_Product.ipynb +700 -0
subproduct_prediction/issue_models/account_operations_and_unauthorized_transaction_issues.pkl +3 -0
subproduct_prediction/issue_models/attempts_to_collect_debt_not_owed.pkl +3 -0
subproduct_prediction/issue_models/closing_an_account.pkl +3 -0
subproduct_prediction/issue_models/closing_your_account.pkl +3 -0
subproduct_prediction/issue_models/credit_report_and_monitoring_issues.pkl +3 -0
subproduct_prediction/issue_models/dealing_with_your_lender_or_servicer.pkl +3 -0
subproduct_prediction/issue_models/disputes_and_misrepresentations.pkl +3 -0
subproduct_prediction/issue_models/improper_use_of_your_report.pkl +3 -0
subproduct_prediction/issue_models/incorrect_information_on_your_report.pkl +3 -0
subproduct_prediction/issue_models/legal_and_threat_actions.pkl +3 -0
subproduct_prediction/issue_models/managing_an_account.pkl +3 -0
subproduct_prediction/issue_models/payment_and_funds_management.pkl +3 -0
subproduct_prediction/issue_models/problem_with_a_company's_investigation_into_an_existing_issue.pkl +3 -0
subproduct_prediction/issue_models/problem_with_a_company's_investigation_into_an_existing_problem.pkl +3 -0
subproduct_prediction/issue_models/problem_with_a_credit_reporting_company's_investigation_into_an_existing_problem.pkl +3 -0
subproduct_prediction/issue_models/problem_with_a_purchase_shown_on_your_statement.pkl +3 -0
subproduct_prediction/issue_models/written_notification_about_debt.pkl +3 -0
subproduct_prediction/models/Checking_saving_model.pkl +3 -0
subproduct_prediction/models/Credit_Prepaid_Card_model.pkl +3 -0
subproduct_prediction/models/Credit_Reporting_model.pkl +3 -0
subproduct_prediction/models/Debt_model.pkl +3 -0
subproduct_prediction/models/Product_model.pkl +3 -0
subproduct_prediction/models/loan_model.pkl +3 -0

subproduct_prediction/.DS_Store ADDED Viewed

Binary file (6.15 kB). View file

subproduct_prediction/.ipynb_checkpoints/Pipeline Modified-checkpoint.ipynb ADDED Viewed

	@@ -0,0 +1,421 @@

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "299ffd7f-502b-4183-9536-4e47654baae8",
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true
+   },
+   "source": [
+    "#### Importing the necessary libraries"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "e27f22a3-f39e-4007-a048-56ccc9af915e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "import pickle\n",
+    "import pandas as pd\n",
+    "from tqdm import tqdm\n",
+    "from sklearn.pipeline import Pipeline\n",
+    "from transformers import pipeline\n",
+    "from sklearn.metrics import accuracy_score, precision_score"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7b6553e4-339a-4003-b6f9-4aa52d2818c0",
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true
+   },
+   "source": [
+    "#### Loading 5 product models"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "bd40a9e0-faab-4999-9ad5-f74e7ae8b272",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with open('models/Credit_Reporting_model.pkl', 'rb') as f:\n",
+    "   trained_model_cr= pickle.load(f)\n",
+    "\n",
+    "with open('models/Credit_Prepaid_Card_model.pkl', 'rb') as f:\n",
+    "   trained_model_cp= pickle.load(f)\n",
+    "\n",
+    "with open('models/Checking_saving_model.pkl', 'rb') as f:\n",
+    "    trained_model_cs=pickle.load(f)\n",
+    "\n",
+    "with open('models/loan_model.pkl', 'rb') as f:\n",
+    "   trained_model_l= pickle.load(f)\n",
+    "\n",
+    "with open('models/Debt_model.pkl', 'rb') as f:\n",
+    "   trained_model_d= pickle.load(f)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8dd19c5a-5e4f-457c-88b7-5efa18964a8b",
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true
+   },
+   "source": [
+    "#### Loading 17 issue models"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "3dae2131-cfa4-4887-a30a-00d6caf547e8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Path to the models and their corresponding names\n",
+    "issue_model_files = {\n",
+    "    'trained_model_account_operations': 'issue_models/account_operations_and_unauthorized_transaction_issues.pkl',\n",
+    "    'trained_model_collect_debt': 'issue_models/attempts_to_collect_debt_not_owed.pkl',\n",
+    "    'trained_model_closing_account': 'issue_models/closing_an_account.pkl',\n",
+    "    'trained_model_closing_your_account': 'issue_models/closing_your_account.pkl',\n",
+    "    'trained_model_credit_report': 'issue_models/credit_report_and_monitoring_issues.pkl',\n",
+    "    'trained_model_lender': 'issue_models/dealing_with_your_lender_or_servicer.pkl',\n",
+    "    'trained_model_disputes': 'issue_models/disputes_and_misrepresentations.pkl',\n",
+    "    'trained_model_improper_use_report': 'issue_models/improper_use_of_your_report.pkl',\n",
+    "    'trained_model_incorrect_info': 'issue_models/incorrect_information_on_your_report.pkl',\n",
+    "    'trained_model_legal_and_threat': 'issue_models/legal_and_threat_actions.pkl',\n",
+    "    'trained_model_managing_account': 'issue_models/managing_an_account.pkl',\n",
+    "    'trained_model_payment_funds': 'issue_models/payment_and_funds_management.pkl',\n",
+    "    'trained_model_investigation_wrt_issue': 'issue_models/problem_with_a_company\\'s_investigation_into_an_existing_issue.pkl',\n",
+    "    'trained_model_investigation_wrt_problem': 'issue_models/problem_with_a_company\\'s_investigation_into_an_existing_problem.pkl',\n",
+    "    'trained_model_credit_investigation_wrt_problem': 'issue_models/problem_with_a_credit_reporting_company\\'s_investigation_into_an_existing_problem.pkl',\n",
+    "    'trained_model_purchase_shown': 'issue_models/problem_with_a_purchase_shown_on_your_statement.pkl',\n",
+    "    'trained_model_notification_about_debt': 'issue_models/written_notification_about_debt.pkl',\n",
+    "}\n",
+    "\n",
+    "issue_models = {}\n",
+    "\n",
+    "for model_name, file_path in issue_model_files.items():\n",
+    "    with open(file_path, 'rb') as f:\n",
+    "        issue_models[model_name] = pickle.load(f)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bf41b143-2ff3-4a79-83a9-afcc0d352dd0",
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true
+   },
+   "source": [
+    "#### LLM to classify the product based on the narrative"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "b946427b-b259-4eb2-a40b-ed7b7e476354",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "device = \"mps\" if torch.backends.mps.is_available() else \"cpu\"\n",
+    "\n",
+    "# Define the pipeline for classifying product\n",
+    "product_classifier = pipeline(\"text-classification\", model=\"Mahesh9/distil-bert-fintuned-product-cfpb-complaints\",\n",
+    "                              max_length = 512, truncation = True, device = device)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0f0c40cd-f23e-4e0a-8c03-34b517a4c727",
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true
+   },
+   "source": [
+    "#### Function to choose the appropriate product model to classify the sub-product"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "619d9c58-1a83-4279-b452-63f3cb69998f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Define a function to select the appropriate subproduct prediction model based on the predicted product\n",
+    "def select_subproduct_model(predicted_product):\n",
+    "    if predicted_product == 'Credit Reporting' :\n",
+    "        return trained_model_cr\n",
+    "    elif predicted_product == 'Credit/Prepaid Card':\n",
+    "        return trained_model_cp\n",
+    "    elif predicted_product == 'Checking or savings account':\n",
+    "        return trained_model_cs\n",
+    "    elif predicted_product == 'Loans / Mortgage':\n",
+    "        return trained_model_l\n",
+    "    elif predicted_product == 'Debt collection':\n",
+    "        return trained_model_d\n",
+    "    else:\n",
+    "        raise ValueError(\"Invalid predicted product category\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2f361468-ab6d-4d9a-a665-2c9dbce42e93",
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true
+   },
+   "source": [
+    "#### LLM to classify the issue based on the narrative"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "0a8da273-8dfb-43b8-abf9-cf06871f2763",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Define the pipeline for classifying issue\n",
+    "issue_classifier = pipeline(\"text-classification\", model=\"Mahesh9/distil-bert-fintuned-issues-cfpb-complaints\",\n",
+    "                            max_length = 512, truncation = True, device = device)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "df05c0c0-c4cc-4287-b129-75f60dd88348",
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true
+   },
+   "source": [
+    "#### Function to choose the appropriate issue model to classify the sub-issue"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "f55a787b-ce6a-49dd-96dd-1cbfda8a68a5",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Define a function to select the appropriate subissue prediction model based on the predicted issue\n",
+    "def select_subissue_model(predicted_issue):\n",
+    "    if predicted_issue == \"Problem with a company's investigation into an existing problem\":\n",
+    "        return issue_models['trained_model_investigation_wrt_problem']\n",
+    "        \n",
+    "    elif predicted_issue == \"Problem with a credit reporting company's investigation into an existing problem\":\n",
+    "        return issue_models['trained_model_credit_investigation_wrt_problem']\n",
+    "\n",
+    "    elif predicted_issue == \"Problem with a company's investigation into an existing issue\":\n",
+    "        return issue_models['trained_model_investigation_wrt_issue']\n",
+    "\n",
+    "    elif predicted_issue == \"Problem with a purchase shown on your statement\":\n",
+    "        return issue_models['trained_model_purchase_shown']\n",
+    "\n",
+    "    elif predicted_issue == \"Incorrect information on your report\":\n",
+    "        return issue_models['trained_model_incorrect_info']\n",
+    "        \n",
+    "    elif predicted_issue == \"Improper use of your report\":\n",
+    "        return issue_models['trained_model_improper_use_report']\n",
+    "\n",
+    "    elif predicted_issue == \"Account Operations and Unauthorized Transaction Issues\":\n",
+    "        return issue_models['trained_model_account_operations']\n",
+    "        \n",
+    "    elif predicted_issue == \"Payment and Funds Management\":\n",
+    "        return issue_models['trained_model_payment_funds']\n",
+    "\n",
+    "    elif predicted_issue == \"Managing an account\":\n",
+    "        return issue_models['trained_model_managing_account']\n",
+    "        \n",
+    "    elif predicted_issue == \"Attempts to collect debt not owed\":\n",
+    "        return issue_models['trained_model_collect_debt']\n",
+    "\n",
+    "    elif predicted_issue == \"Written notification about debt\":\n",
+    "        return issue_models['trained_model_notification_about_debt']\n",
+    "        \n",
+    "    elif predicted_issue == \"Dealing with your lender or servicer\":\n",
+    "        return issue_models['trained_model_lender']\n",
+    "\n",
+    "    elif predicted_issue == \"Disputes and Misrepresentations\":\n",
+    "        return issue_models['trained_model_disputes']\n",
+    "        \n",
+    "    elif predicted_issue == \"Closing your account\":\n",
+    "        return issue_models['trained_model_closing_your_account']\n",
+    "\n",
+    "    elif predicted_issue == \"Closing an account\":\n",
+    "        return issue_models['trained_model_closing_account']\n",
+    "        \n",
+    "    elif predicted_issue == \"Credit Report and Monitoring Issues\":\n",
+    "        return issue_models['trained_model_credit_report']\n",
+    "\n",
+    "    elif predicted_issue == \"Legal and Threat Actions\":\n",
+    "        return issue_models['trained_model_legal_and_threat']\n",
+    "        \n",
+    "    else:\n",
+    "        raise ValueError(\"Invalid predicted issue category\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d87974e1-1bf8-44ea-bfee-75de8e2960b4",
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true
+   },
+   "source": [
+    "#### Driver code to classify the complaint into various categories"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "dc785511-d68f-4341-a080-23f8f27eefc4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def classify_complaint(narrative):\n",
+    "    # Predict product category\n",
+    "    predicted_product = product_classifier(narrative)[0]['label']\n",
+    "    \n",
+    "    # Load the appropriate subproduct prediction model\n",
+    "    subproduct_model = select_subproduct_model(predicted_product)\n",
+    "    # Predict subproduct category using the selected model\n",
+    "    predicted_subproduct = subproduct_model.predict([narrative])[0]\n",
+    "\n",
+    "\n",
+    "    \n",
+    "    # Predict the appropriate issue category using the narrative\n",
+    "    predicted_issue = issue_classifier(narrative)[0]['label']\n",
+    "    \n",
+    "    # Load the appropriate subissue prediction model\n",
+    "    subissue_model = select_subissue_model(predicted_issue)\n",
+    "    # Predict subissue category using the selected model\n",
+    "    predicted_subissue = subissue_model.predict([narrative])[0]\n",
+    "    \n",
+    "    return {\n",
+    "        \"Product\" : predicted_product,\n",
+    "        \"Sub-product\" : predicted_subproduct,\n",
+    "        \"Issue\" : predicted_issue,\n",
+    "        \"Sub-issue\" : predicted_subissue\n",
+    "    }"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "982521ea-364e-4521-889e-fe586c186701",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'Product': 'Credit/Prepaid Card',\n",
+       " 'Sub-product': 'General-purpose credit card or charge card',\n",
+       " 'Issue': \"Problem with a company's investigation into an existing problem\",\n",
+       " 'Sub-issue': 'Was not notified of investigation status or results'}"
+      ]
+     },
+     "execution_count": 9,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "narrative = \"\"\"It is absurd that I have consistently made timely payments for this account and have never been\n",
+    "             overdue. I kindly request that you promptly update my account to reflect this accurately.\"\"\"\n",
+    "\n",
+    "classify_complaint(narrative)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6a68ebbc-de80-4176-ac38-bfe5fd84b86c",
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true
+   },
+   "source": [
+    "#### Evaluation on external test set"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "88529ef1-6ed2-41b9-a266-e550a50b831f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Load the test dataset\n",
+    "test_data = pd.read_csv('../data_splits/test-data-split.csv')  \n",
+    "\n",
+    "# Initialize lists to store predicted and actual labels\n",
+    "predicted_products = []\n",
+    "predicted_subproducts = []\n",
+    "predicted_issues = []\n",
+    "predicted_subissues = []\n",
+    "\n",
+    "actual_products = test_data['Product']\n",
+    "actual_subproducts = test_data['Sub-product']\n",
+    "actual_issues = test_data['Issue']\n",
+    "actual_subissues = test_data['Sub-issue']\n",
+    "\n",
+    "# Iterate over each complaint narrative in the test set\n",
+    "for narrative in tqdm(test_data['Consumer complaint narrative']):\n",
+    "    # Predict product and subproduct using the custom_predict function\n",
+    "    prediction = classify_complaint(narrative)\n",
+    "    \n",
+    "    # Append predicted labels to lists\n",
+    "    predicted_products.append(prediction['Product'])\n",
+    "    predicted_subproducts.append(prediction['Sub-product'])\n",
+    "    predicted_issues.append(prediction['Issue'])\n",
+    "    predicted_subissues.append(prediction['Sub-issue'])\n",
+    "    \n",
+    "# Calculate accuracy and precision\n",
+    "accuracy_product = accuracy_score(actual_products, predicted_products)\n",
+    "precision_product = precision_score(actual_products, predicted_products, average='macro',zero_division=1)\n",
+    "accuracy_subproduct = accuracy_score(actual_subproducts, predicted_subproducts)\n",
+    "precision_subproduct = precision_score(actual_subproducts, predicted_subproducts, average='macro',zero_division=1)\n",
+    "\n",
+    "accuracy_product = accuracy_score(actual_issues, predicted_issues)\n",
+    "precision_product = precision_score(actual_issues, predicted_issues, average='macro',zero_division=1)\n",
+    "accuracy_subproduct = accuracy_score(actual_subissues, predicted_subissues)\n",
+    "precision_subproduct = precision_score(actual_subissues, predicted_subissues, average='macro',zero_division=1)\n",
+    "\n",
+    "\n",
+    "# Print the results\n",
+    "print(\"Product Prediction Accuracy:\", accuracy_product)\n",
+    "print(\"Product Prediction Precision:\", precision_product)\n",
+    "\n",
+    "print(\"Subproduct Prediction Accuracy:\", accuracy_subproduct)\n",
+    "print(\"Subproduct Prediction Precision:\", precision_subproduct)\n",
+    "\n",
+    "print(\"Issue Prediction Accuracy:\", accuracy_issue)\n",
+    "print(\"Issue Prediction Precision:\", precision_issue)\n",
+    "\n",
+    "print(\"Sub-issue Prediction Accuracy:\", accuracy_issue)\n",
+    "print(\"Sub-issue Prediction Precision:\", precision_issue)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.19"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

subproduct_prediction/.ipynb_checkpoints/Pipeline-checkpoint.ipynb ADDED Viewed

	@@ -0,0 +1,192 @@

+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "e27f22a3-f39e-4007-a048-56ccc9af915e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pickle"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "bd40a9e0-faab-4999-9ad5-f74e7ae8b272",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with open('models/Credit_Reporting_model.pkl', 'rb') as f:\n",
+    "   trained_model_cr= pickle.load(f)\n",
+    "\n",
+    "with open('models/Credit_Prepaid_Card_model.pkl', 'rb') as f:\n",
+    "   trained_model_cp= pickle.load(f)\n",
+    "\n",
+    "with open('models/Checking_saving_model.pkl', 'rb') as f:\n",
+    "    trained_model_cs=pickle.load(f)\n",
+    "\n",
+    "with open('models/loan_model.pkl', 'rb') as f:\n",
+    "   trained_model_l= pickle.load(f)\n",
+    "\n",
+    "with open('models/Debt_model.pkl', 'rb') as f:\n",
+    "   trained_model_d= pickle.load(f)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "d1ad5fb9-36bf-4637-a137-17fca19224f6",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with open('models/Product_model.pkl', 'rb') as f:\n",
+    "   product_model= pickle.load(f)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "b946427b-b259-4eb2-a40b-ed7b7e476354",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sklearn.pipeline import Pipeline\n",
+    "\n",
+    "# Define the pipeline steps\n",
+    "trained_product_model=product_model\n",
+    "\n",
+    "\n",
+    "# Define a function to select the appropriate subproduct prediction model based on the predicted product\n",
+    "def select_subproduct_model(predicted_product):\n",
+    "    if predicted_product == 'Credit Reporting' :\n",
+    "        return trained_model_cr\n",
+    "    elif predicted_product == 'Credit/Prepaid Card':\n",
+    "        return trained_model_cp\n",
+    "    elif predicted_product == 'Checking or savings account':\n",
+    "        return trained_model_cs\n",
+    "    elif predicted_product == 'Loans / Mortgage':\n",
+    "        return trained_model_l\n",
+    "    elif predicted_product == 'Debt collection':\n",
+    "        return trained_model_d\n",
+    "    else:\n",
+    "        raise ValueError(\"Invalid predicted product category\")\n",
+    "\n",
+    "def custom_predict(narrative):\n",
+    "    # Predict product category\n",
+    "    predicted_product = product_model.predict([narrative])[0]\n",
+    "    \n",
+    "    # Load the appropriate subproduct prediction model\n",
+    "    subproduct_model = select_subproduct_model(predicted_product)\n",
+    "    \n",
+    "    # Predict subproduct category using the selected model\n",
+    "    predicted_subproduct = subproduct_model.predict([narrative])\n",
+    "    return predicted_product, predicted_subproduct"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "982521ea-364e-4521-889e-fe586c186701",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Predicted product: Credit/Prepaid Card\n",
+      "Predicted subproduct: ['Checking account']\n"
+     ]
+    }
+   ],
+   "source": [
+    "narrative = \"I have a problem with my credit card bill.\"\n",
+    "#narrative = \"it is absurd that i have consistently made timely payments for this account and have never been overdue. i kindly request that you promptly update my account to reflect this accurately.\"\n",
+    "predicted_product, predicted_subproduct = custom_predict(narrative)\n",
+    "print(\"Predicted product:\", predicted_product)\n",
+    "print(\"Predicted subproduct:\", predicted_subproduct)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "88529ef1-6ed2-41b9-a266-e550a50b831f",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Product Prediction Accuracy: 0.9110859728506787\n",
+      "Product Prediction Precision: 0.6634108079865927\n",
+      "Subproduct Prediction Accuracy: 0.8377989657401422\n",
+      "Subproduct Prediction Precision: 0.5058767033148038\n"
+     ]
+    }
+   ],
+   "source": [
+    "from sklearn.metrics import accuracy_score, precision_score\n",
+    "import pandas as pd\n",
+    "\n",
+    "# Load the test dataset\n",
+    "test_data = pd.read_csv('../data_splits/test-data-split.csv')  \n",
+    "\n",
+    "# Initialize lists to store predicted and actual labels\n",
+    "predicted_products = []\n",
+    "predicted_subproducts = []\n",
+    "actual_products = test_data['Product']\n",
+    "actual_subproducts = test_data['Sub-product']\n",
+    "\n",
+    "# Iterate over each complaint narrative in the test set\n",
+    "for narrative in test_data['Consumer complaint narrative']:\n",
+    "    # Predict product and subproduct using the custom_predict function\n",
+    "    predicted_product, predicted_subproduct = custom_predict(narrative)\n",
+    "    \n",
+    "    # Append predicted labels to lists\n",
+    "    predicted_products.append(predicted_product)\n",
+    "    predicted_subproducts.append(predicted_subproduct)\n",
+    "\n",
+    "# Calculate accuracy and precision\n",
+    "accuracy_product = accuracy_score(actual_products, predicted_products)\n",
+    "precision_product = precision_score(actual_products, predicted_products, average='macro',zero_division=1)\n",
+    "accuracy_subproduct = accuracy_score(actual_subproducts, predicted_subproducts)\n",
+    "precision_subproduct = precision_score(actual_subproducts, predicted_subproducts, average='macro',zero_division=1)\n",
+    "\n",
+    "# Print the results\n",
+    "print(\"Product Prediction Accuracy:\", accuracy_product)\n",
+    "print(\"Product Prediction Precision:\", precision_product)\n",
+    "print(\"Subproduct Prediction Accuracy:\", accuracy_subproduct)\n",
+    "print(\"Subproduct Prediction Precision:\", precision_subproduct)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ce982e0a",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.19"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

subproduct_prediction/.ipynb_checkpoints/Sub_Issue-checkpoint.ipynb ADDED Viewed

	@@ -0,0 +1,1900 @@

+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "a751d479-1500-41e2-8c01-252e849dad05",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import warnings\n",
+    "warnings.filterwarnings(\"ignore\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "8158cb66-9f9a-4bb2-bc6e-6a51146be10c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt \n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.feature_extraction.text import TfidfVectorizer\n",
+    "from sklearn.pipeline import make_pipeline\n",
+    "from sklearn.linear_model import LogisticRegression\n",
+    "from sklearn.naive_bayes import MultinomialNB\n",
+    "from sklearn.svm import SVC\n",
+    "from sklearn.ensemble import RandomForestClassifier\n",
+    "from sklearn.metrics import classification_report,accuracy_score\n",
+    "import numpy as np\n",
+    "from sklearn.ensemble import RandomForestClassifier\n",
+    "from sklearn.preprocessing import OneHotEncoder\n",
+    "from sklearn.compose import ColumnTransformer\n",
+    "from sklearn.pipeline import Pipeline\n",
+    "from sklearn.pipeline import Pipeline\n",
+    "from sklearn.feature_extraction.text import TfidfVectorizer\n",
+    "from sklearn.ensemble import RandomForestClassifier\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.metrics import classification_report, accuracy_score\n",
+    "from sklearn.utils.class_weight import compute_class_weight\n",
+    "import pickle"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "70ea935b-3b62-4cf9-8bef-06bf30904b20",
+   "metadata": {},
+   "source": [
+    "## Sub Issues"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f9ddaa89-dc8d-40f5-8098-7d108ab9d578",
+   "metadata": {},
+   "source": [
+    "### Model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "c1f9fd85-f47e-4962-a693-7cb9efca763a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sklearn.pipeline import Pipeline\n",
+    "from sklearn.feature_extraction.text import TfidfVectorizer\n",
+    "from sklearn.metrics import accuracy_score, classification_report\n",
+    "from sklearn.utils.class_weight import compute_class_weight\n",
+    "\n",
+    "def train_model(training_df, validation_df, target_column, classifier_model, subissues_to_drop=None, random_state=42):\n",
+    "    # Drop specified subproducts from training and validation dataframes\n",
+    "    if subissues_to_drop:\n",
+    "        training_df = training_df[~training_df[target_column].isin(subissues_to_drop)]\n",
+    "        validation_df = validation_df[~validation_df[target_column].isin(subissues_to_drop)]\n",
+    "    \n",
+    "    # Compute class weights\n",
+    "    class_weights = compute_class_weight('balanced', classes=np.unique(training_df[target_column]), y=training_df[target_column])\n",
+    "    \n",
+    "    # Convert class weights to dictionary format\n",
+    "    class_weight = {label: weight for label, weight in zip(np.unique(training_df[target_column]), class_weights)}\n",
+    "    \n",
+    "    # Define a default class weight for missing classes\n",
+    "    default_class_weight = 0.5\n",
+    "    \n",
+    "    # Assign default class weight for missing classes\n",
+    "    for label in np.unique(training_df[target_column]):\n",
+    "        if label not in class_weight:\n",
+    "            class_weight[label] = default_class_weight\n",
+    "    \n",
+    "    # Define the pipeline\n",
+    "    pipeline = Pipeline([\n",
+    "        ('tfidf', TfidfVectorizer()),\n",
+    "        ('classifier', classifier_model)\n",
+    "    ])\n",
+    "    \n",
+    "    # Train the pipeline\n",
+    "    pipeline.fit(training_df['Consumer complaint narrative'], training_df[target_column])\n",
+    "    \n",
+    "    # Make predictions on the validation set\n",
+    "    y_pred = pipeline.predict(validation_df['Consumer complaint narrative'])\n",
+    "    \n",
+    "    # Evaluate the pipeline\n",
+    "    accuracy = accuracy_score(validation_df[target_column], y_pred)\n",
+    "    print(\"Accuracy:\", accuracy)\n",
+    "    print(\"\\nClassification Report:\")\n",
+    "    print(classification_report(validation_df[target_column], y_pred))\n",
+    "    \n",
+    "    return pipeline"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a7a0d277-75c1-4435-86e5-d0ee7d3dabf3",
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true
+   },
+   "source": [
+    "#### Reading the Issue DataFrame"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "c1ea3fbc-4062-483b-a5c6-65d644983ce5",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import pandas as pd\n",
+    "\n",
+    "def read_subissue_data(issue_name, data_dir='../data_preprocessing_scripts/issue_data_splits'):\n",
+    "    # Convert issue name to lower case and replace '/' and spaces with underscores\n",
+    "    issue_name = issue_name.replace('/', '_').replace(' ', '_').lower()\n",
+    "    \n",
+    "    # Construct file paths\n",
+    "    train_file = os.path.join(data_dir, f\"{issue_name}_train_data.csv\")\n",
+    "    val_file = os.path.join(data_dir, f\"{issue_name}_val_data.csv\")\n",
+    "    \n",
+    "    # Read the CSV files\n",
+    "    train_df = pd.read_csv(train_file)\n",
+    "    val_df = pd.read_csv(val_file )\n",
+    "    \n",
+    "    return train_df, val_df"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7a53f046-c7f8-48de-a8f3-9a66ffad5f55",
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true
+   },
+   "source": [
+    "#### Incorrect Information on your report"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "665d036b-dd86-4cf5-a2ff-23358fb148c7",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "incorrect_information_train_df,incorrect_information_val_df= read_subissue_data('Incorrect information on your report')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "be2f7669-496b-4f5d-a4ab-1dd8137ec988",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>Unnamed: 0</th>\n",
+       "      <th>Consumer complaint narrative</th>\n",
+       "      <th>Issue</th>\n",
+       "      <th>Sub-issue</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>153339</td>\n",
+       "      <td>XX/XX/XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX,...</td>\n",
+       "      <td>Incorrect information on your report</td>\n",
+       "      <td>Old information reappears or never goes away</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>160945</td>\n",
+       "      <td>This is my Follow-up request that I have been ...</td>\n",
+       "      <td>Incorrect information on your report</td>\n",
+       "      <td>Information belongs to someone else</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>41469</td>\n",
+       "      <td>This inquiry does not belong to me ; I have no...</td>\n",
+       "      <td>Incorrect information on your report</td>\n",
+       "      <td>Information belongs to someone else</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>34315</td>\n",
+       "      <td>I have items passed on that should be taken ou...</td>\n",
+       "      <td>Incorrect information on your report</td>\n",
+       "      <td>Information belongs to someone else</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>48970</td>\n",
+       "      <td>Im submitting a complaint to you today to info...</td>\n",
+       "      <td>Incorrect information on your report</td>\n",
+       "      <td>Information belongs to someone else</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "   Unnamed: 0                       Consumer complaint narrative  \\\n",
+       "0      153339  XX/XX/XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX,...   \n",
+       "1      160945  This is my Follow-up request that I have been ...   \n",
+       "2       41469  This inquiry does not belong to me ; I have no...   \n",
+       "3       34315  I have items passed on that should be taken ou...   \n",
+       "4       48970  Im submitting a complaint to you today to info...   \n",
+       "\n",
+       "                                  Issue  \\\n",
+       "0  Incorrect information on your report   \n",
+       "1  Incorrect information on your report   \n",
+       "2  Incorrect information on your report   \n",
+       "3  Incorrect information on your report   \n",
+       "4  Incorrect information on your report   \n",
+       "\n",
+       "                                      Sub-issue  \n",
+       "0  Old information reappears or never goes away  \n",
+       "1           Information belongs to someone else  \n",
+       "2           Information belongs to someone else  \n",
+       "3           Information belongs to someone else  \n",
+       "4           Information belongs to someone else  "
+      ]
+     },
+     "execution_count": 9,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "incorrect_information_train_df.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "b78398b7-d027-403f-acf4-fa580d113b02",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Accuracy: 0.8831804281345565\n",
+      "\n",
+      "Classification Report:\n",
+      "                                                     precision    recall  f1-score   support\n",
+      "\n",
+      "                      Account information incorrect       0.74      0.68      0.71       699\n",
+      "                           Account status incorrect       0.87      0.73      0.79       771\n",
+      "                Information belongs to someone else       0.90      0.99      0.94      4337\n",
+      "Information is missing that should be on the report       0.95      0.31      0.47        65\n",
+      "       Old information reappears or never goes away       0.93      0.40      0.56       126\n",
+      "                     Personal information incorrect       0.95      0.78      0.86       440\n",
+      "               Public record information inaccurate       0.98      0.47      0.64       102\n",
+      "\n",
+      "                                           accuracy                           0.88      6540\n",
+      "                                          macro avg       0.90      0.62      0.71      6540\n",
+      "                                       weighted avg       0.88      0.88      0.88      6540\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "from sklearn.ensemble import RandomForestClassifier\n",
+    "rf_classifier = RandomForestClassifier(n_estimators=200, random_state=42)\n",
+    "trained_model_ii = train_model(incorrect_information_train_df, incorrect_information_val_df, 'Sub-issue', rf_classifier, random_state=42)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "85bbc3fe-50b0-4578-8e67-151861f839da",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with open('issue_models/incorrect_information_on_your_report.pkl', 'wb') as f:\n",
+    "    pickle.dump(trained_model_ii, f)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5c529ed8-3735-4494-9f90-6c005dfea6df",
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true
+   },
+   "source": [
+    "#### Improper use of your report "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "f33b26e9-4c5b-4498-ab23-a88aca5eb07f",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>Unnamed: 0</th>\n",
+       "      <th>Consumer complaint narrative</th>\n",
+       "      <th>Issue</th>\n",
+       "      <th>Sub-issue</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>75713</td>\n",
+       "      <td>I found inaccurate and incorrect data on my cr...</td>\n",
+       "      <td>Improper use of your report</td>\n",
+       "      <td>Credit inquiries on your report that you don't...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>72157</td>\n",
+       "      <td>HI I AM SUBMITTING THIS WITHOUT ANY INFLUENCE ...</td>\n",
+       "      <td>Improper use of your report</td>\n",
+       "      <td>Credit inquiries on your report that you don't...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>174012</td>\n",
+       "      <td>I checked my credit report and seen that there...</td>\n",
+       "      <td>Improper use of your report</td>\n",
+       "      <td>Credit inquiries on your report that you don't...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>131412</td>\n",
+       "      <td>XXXX XXXX XXXX XXXX has started to report inco...</td>\n",
+       "      <td>Improper use of your report</td>\n",
+       "      <td>Reporting company used your report improperly</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>157599</td>\n",
+       "      <td>My name is XXXX XXXX this complaint is not mad...</td>\n",
+       "      <td>Improper use of your report</td>\n",
+       "      <td>Reporting company used your report improperly</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "   Unnamed: 0                       Consumer complaint narrative  \\\n",
+       "0       75713  I found inaccurate and incorrect data on my cr...   \n",
+       "1       72157  HI I AM SUBMITTING THIS WITHOUT ANY INFLUENCE ...   \n",
+       "2      174012  I checked my credit report and seen that there...   \n",
+       "3      131412  XXXX XXXX XXXX XXXX has started to report inco...   \n",
+       "4      157599  My name is XXXX XXXX this complaint is not mad...   \n",
+       "\n",
+       "                         Issue  \\\n",
+       "0  Improper use of your report   \n",
+       "1  Improper use of your report   \n",
+       "2  Improper use of your report   \n",
+       "3  Improper use of your report   \n",
+       "4  Improper use of your report   \n",
+       "\n",
+       "                                           Sub-issue  \n",
+       "0  Credit inquiries on your report that you don't...  \n",
+       "1  Credit inquiries on your report that you don't...  \n",
+       "2  Credit inquiries on your report that you don't...  \n",
+       "3      Reporting company used your report improperly  \n",
+       "4      Reporting company used your report improperly  "
+      ]
+     },
+     "execution_count": 12,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "improper_use_report_train_df,improper_use_report_val_df= read_subissue_data('Improper use of your report')\n",
+    "improper_use_report_train_df.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "c8dcc18b-f7bb-4edd-965a-8c58500a0ea6",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Accuracy: 0.9528423772609819\n",
+      "\n",
+      "Classification Report:\n",
+      "                                                          precision    recall  f1-score   support\n",
+      "\n",
+      "Credit inquiries on your report that you don't recognize       0.93      0.84      0.88       990\n",
+      "           Reporting company used your report improperly       0.96      0.98      0.97      3654\n",
+      "\n",
+      "                                                accuracy                           0.95      4644\n",
+      "                                               macro avg       0.95      0.91      0.93      4644\n",
+      "                                            weighted avg       0.95      0.95      0.95      4644\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "from sklearn.ensemble import RandomForestClassifier\n",
+    "rf_classifier = RandomForestClassifier(n_estimators=200, random_state=42)\n",
+    "trained_model_iu = train_model(improper_use_report_train_df, improper_use_report_val_df, 'Sub-issue', rf_classifier, random_state=42)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 60,
+   "id": "a668b946-da36-410f-b474-f8a311952c5d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with open('models/loan_model.pkl', 'wb') as f:\n",
+    "    pickle.dump(trained_model_iu, f)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "74796ebf-9934-46d2-a1b7-d6672dea727c",
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true
+   },
+   "source": [
+    "#### Problem with a credit reporting company's investigation into an existing problem"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 25,
+   "id": "7cde4eda-37a1-4643-b62b-41e7be8f865f",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>Unnamed: 0</th>\n",
+       "      <th>Consumer complaint narrative</th>\n",
+       "      <th>Issue</th>\n",
+       "      <th>Sub-issue</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>117380</td>\n",
+       "      <td>On XX/XX/2023 I sent a letter to XXXX, Experia...</td>\n",
+       "      <td>Problem with a credit reporting company's inve...</td>\n",
+       "      <td>Investigation took more than 30 days</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>172530</td>\n",
+       "      <td>XXXX XXXX XXXX XXXX XXXX, PA XXXX Please be ad...</td>\n",
+       "      <td>Problem with a credit reporting company's inve...</td>\n",
+       "      <td>Their investigation did not fix an error on yo...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>5336</td>\n",
+       "      <td>This creditor engaged in abusive, deceptive, a...</td>\n",
+       "      <td>Problem with a credit reporting company's inve...</td>\n",
+       "      <td>Was not notified of investigation status or re...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>63755</td>\n",
+       "      <td>Despite multiple written requests, the unverif...</td>\n",
+       "      <td>Problem with a credit reporting company's inve...</td>\n",
+       "      <td>Their investigation did not fix an error on yo...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>124437</td>\n",
+       "      <td>I have a loan with DEPT OF EDUCATION / XXXX. I...</td>\n",
+       "      <td>Problem with a credit reporting company's inve...</td>\n",
+       "      <td>Their investigation did not fix an error on yo...</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "   Unnamed: 0                       Consumer complaint narrative  \\\n",
+       "0      117380  On XX/XX/2023 I sent a letter to XXXX, Experia...   \n",
+       "1      172530  XXXX XXXX XXXX XXXX XXXX, PA XXXX Please be ad...   \n",
+       "2        5336  This creditor engaged in abusive, deceptive, a...   \n",
+       "3       63755  Despite multiple written requests, the unverif...   \n",
+       "4      124437  I have a loan with DEPT OF EDUCATION / XXXX. I...   \n",
+       "\n",
+       "                                               Issue  \\\n",
+       "0  Problem with a credit reporting company's inve...   \n",
+       "1  Problem with a credit reporting company's inve...   \n",
+       "2  Problem with a credit reporting company's inve...   \n",
+       "3  Problem with a credit reporting company's inve...   \n",
+       "4  Problem with a credit reporting company's inve...   \n",
+       "\n",
+       "                                           Sub-issue  \n",
+       "0               Investigation took more than 30 days  \n",
+       "1  Their investigation did not fix an error on yo...  \n",
+       "2  Was not notified of investigation status or re...  \n",
+       "3  Their investigation did not fix an error on yo...  \n",
+       "4  Their investigation did not fix an error on yo...  "
+      ]
+     },
+     "execution_count": 25,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "problem_credit_reporting_train_df, problem_credit_reporting_val_df = read_subissue_data(\"Problem with a credit reporting company's investigation into an existing problem\")\n",
+    "\n",
+    "# Displaying the first few rows of the training data\n",
+    "problem_credit_reporting_train_df.head()\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 26,
+   "id": "1cc65f08-96c8-4458-8703-b84b7554a04c",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Accuracy: 0.9288035450516987\n",
+      "\n",
+      "Classification Report:\n",
+      "                                                                                       precision    recall  f1-score   support\n",
+      "\n",
+      "Difficulty submitting a dispute or getting information about a dispute over the phone       0.83      0.36      0.50        83\n",
+      "                                                 Investigation took more than 30 days       0.97      0.84      0.90       505\n",
+      "                                           Problem with personal statement of dispute       1.00      0.38      0.55        47\n",
+      "                              Their investigation did not fix an error on your report       0.92      0.99      0.95      2277\n",
+      "                                  Was not notified of investigation status or results       0.96      0.88      0.92       473\n",
+      "\n",
+      "                                                                             accuracy                           0.93      3385\n",
+      "                                                                            macro avg       0.94      0.69      0.77      3385\n",
+      "                                                                         weighted avg       0.93      0.93      0.92      3385\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "rf_classifier = RandomForestClassifier(n_estimators=200, random_state=42)\n",
+    "trained_model_problem_credit_reporting = train_model(problem_credit_reporting_train_df, problem_credit_reporting_val_df, 'Sub-issue', rf_classifier, random_state=42)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 69,
+   "id": "59c87ff1-d7de-41a9-9e0a-33630bff1c18",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with open('models/Checking_saving_model.pkl', 'wb') as f:\n",
+    "    pickle.dump(trained_model_problem_credit_reporting, f)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fe443859-4be6-4b87-be79-22487aaf5b3b",
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true
+   },
+   "source": [
+    "#### Problem with a company's investigation into an existing problem"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 28,
+   "id": "31a70db8-06cb-4fb0-8d45-a7451aa81b0e",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>Unnamed: 0</th>\n",
+       "      <th>Consumer complaint narrative</th>\n",
+       "      <th>Issue</th>\n",
+       "      <th>Sub-issue</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>30922</td>\n",
+       "      <td>I have filed numerous FTC reports and disputes...</td>\n",
+       "      <td>Problem with a company's investigation into an...</td>\n",
+       "      <td>Investigation took more than 30 days</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>6933</td>\n",
+       "      <td>I filed a dispute for incorrect information on...</td>\n",
+       "      <td>Problem with a company's investigation into an...</td>\n",
+       "      <td>Their investigation did not fix an error on yo...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>34620</td>\n",
+       "      <td>When I reviewed my credit report, I discovered...</td>\n",
+       "      <td>Problem with a company's investigation into an...</td>\n",
+       "      <td>Their investigation did not fix an error on yo...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>56460</td>\n",
+       "      <td>I am writing to convey my ongoing concern rega...</td>\n",
+       "      <td>Problem with a company's investigation into an...</td>\n",
+       "      <td>Their investigation did not fix an error on yo...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>128600</td>\n",
+       "      <td>When I reviewed my credit report, I discovered...</td>\n",
+       "      <td>Problem with a company's investigation into an...</td>\n",
+       "      <td>Their investigation did not fix an error on yo...</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "   Unnamed: 0                       Consumer complaint narrative  \\\n",
+       "0       30922  I have filed numerous FTC reports and disputes...   \n",
+       "1        6933  I filed a dispute for incorrect information on...   \n",
+       "2       34620  When I reviewed my credit report, I discovered...   \n",
+       "3       56460  I am writing to convey my ongoing concern rega...   \n",
+       "4      128600  When I reviewed my credit report, I discovered...   \n",
+       "\n",
+       "                                               Issue  \\\n",
+       "0  Problem with a company's investigation into an...   \n",
+       "1  Problem with a company's investigation into an...   \n",
+       "2  Problem with a company's investigation into an...   \n",
+       "3  Problem with a company's investigation into an...   \n",
+       "4  Problem with a company's investigation into an...   \n",
+       "\n",
+       "                                           Sub-issue  \n",
+       "0               Investigation took more than 30 days  \n",
+       "1  Their investigation did not fix an error on yo...  \n",
+       "2  Their investigation did not fix an error on yo...  \n",
+       "3  Their investigation did not fix an error on yo...  \n",
+       "4  Their investigation did not fix an error on yo...  "
+      ]
+     },
+     "execution_count": 28,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Reading the data\n",
+    "problem_company_investigation_train_df, problem_company_investigation_val_df = read_subissue_data(\"Problem with a company's investigation into an existing problem\")\n",
+    "\n",
+    "# Displaying the first few rows of the training data\n",
+    "problem_company_investigation_train_df.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 29,
+   "id": "0e70a22d-01f9-4f59-a903-286a05eb5179",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Accuracy: 0.9199747952110902\n",
+      "\n",
+      "Classification Report:\n",
+      "                                                                                       precision    recall  f1-score   support\n",
+      "\n",
+      "Difficulty submitting a dispute or getting information about a dispute over the phone       0.88      0.37      0.52        41\n",
+      "                                                 Investigation took more than 30 days       0.95      0.73      0.83       162\n",
+      "                                           Problem with personal statement of dispute       0.90      0.53      0.67        53\n",
+      "                              Their investigation did not fix an error on your report       0.91      1.00      0.95      1122\n",
+      "                                  Was not notified of investigation status or results       0.98      0.87      0.92       209\n",
+      "\n",
+      "                                                                             accuracy                           0.92      1587\n",
+      "                                                                            macro avg       0.93      0.70      0.78      1587\n",
+      "                                                                         weighted avg       0.92      0.92      0.91      1587\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "from sklearn.ensemble import RandomForestClassifier\n",
+    "\n",
+    "rf_classifier = RandomForestClassifier(n_estimators=200, random_state=42)\n",
+    "trained_model_problem_company_investigation = train_model(problem_company_investigation_train_df, problem_company_investigation_val_df, 'Sub-issue', rf_classifier, random_state=42)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 68,
+   "id": "ac3f39d0-8cb8-457e-9db7-510cc5a99830",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with open('models/trained_model_problem_company_investigation.pkl', 'wb') as f:\n",
+    "    pickle.dump(trained_model_problem_company_investigation, f)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0787d4eb-9673-417b-91d1-cc98becd037e",
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true
+   },
+   "source": [
+    "#### Managing an account"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 30,
+   "id": "8e074864-16f6-4fd5-8bfe-b054aeb0fc2a",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>Unnamed: 0</th>\n",
+       "      <th>Consumer complaint narrative</th>\n",
+       "      <th>Issue</th>\n",
+       "      <th>Sub-issue</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>37312</td>\n",
+       "      <td>On XX/XX/2023 I had XXXX in my savings account...</td>\n",
+       "      <td>Managing an account</td>\n",
+       "      <td>Fee problem</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>92449</td>\n",
+       "      <td>I recently opened a new account with this bank...</td>\n",
+       "      <td>Managing an account</td>\n",
+       "      <td>Deposits and withdrawals</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>169426</td>\n",
+       "      <td>Wells Fargo bank has leaked my account details...</td>\n",
+       "      <td>Managing an account</td>\n",
+       "      <td>Deposits and withdrawals</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>60751</td>\n",
+       "      <td>I disputed two transactions on my Wells Fargo ...</td>\n",
+       "      <td>Managing an account</td>\n",
+       "      <td>Problem using a debit or ATM card</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>169432</td>\n",
+       "      <td>On XX/XX/23 someone hacked my XXXX app and ord...</td>\n",
+       "      <td>Managing an account</td>\n",
+       "      <td>Funds not handled or disbursed as instructed</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "   Unnamed: 0                       Consumer complaint narrative  \\\n",
+       "0       37312  On XX/XX/2023 I had XXXX in my savings account...   \n",
+       "1       92449  I recently opened a new account with this bank...   \n",
+       "2      169426  Wells Fargo bank has leaked my account details...   \n",
+       "3       60751  I disputed two transactions on my Wells Fargo ...   \n",
+       "4      169432  On XX/XX/23 someone hacked my XXXX app and ord...   \n",
+       "\n",
+       "                 Issue                                     Sub-issue  \n",
+       "0  Managing an account                                   Fee problem  \n",
+       "1  Managing an account                      Deposits and withdrawals  \n",
+       "2  Managing an account                      Deposits and withdrawals  \n",
+       "3  Managing an account             Problem using a debit or ATM card  \n",
+       "4  Managing an account  Funds not handled or disbursed as instructed  "
+      ]
+     },
+     "execution_count": 30,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Update the issue name in the function call to read_subissue_data\n",
+    "managing_account_train_df, managing_account_val_df = read_subissue_data(\"Managing an account\")\n",
+    "\n",
+    "# Displaying the first few rows of the training data\n",
+    "managing_account_train_df.head()\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 31,
+   "id": "57257613-7dde-4561-942c-f559d2159744",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Accuracy: 0.5171171171171172\n",
+      "\n",
+      "Classification Report:\n",
+      "                                              precision    recall  f1-score   support\n",
+      "\n",
+      "                              Banking errors       0.50      0.10      0.16        73\n",
+      "                    Deposits and withdrawals       0.47      0.90      0.61       201\n",
+      "                                 Fee problem       0.56      0.59      0.57        56\n",
+      "Funds not handled or disbursed as instructed       0.00      0.00      0.00        72\n",
+      "                   Problem accessing account       0.00      0.00      0.00        40\n",
+      "           Problem using a debit or ATM card       0.70      0.58      0.64       113\n",
+      "\n",
+      "                                    accuracy                           0.52       555\n",
+      "                                   macro avg       0.37      0.36      0.33       555\n",
+      "                                weighted avg       0.43      0.52      0.43       555\n",
+      "\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/Users/shivanimundle/opt/anaconda3/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.\n",
+      "  _warn_prf(average, modifier, msg_start, len(result))\n",
+      "/Users/shivanimundle/opt/anaconda3/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.\n",
+      "  _warn_prf(average, modifier, msg_start, len(result))\n",
+      "/Users/shivanimundle/opt/anaconda3/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.\n",
+      "  _warn_prf(average, modifier, msg_start, len(result))\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Initialize the RandomForestClassifier\n",
+    "rf_classifier = RandomForestClassifier(n_estimators=200, random_state=42)\n",
+    "\n",
+    "# Train the model using the updated training and validation datasets\n",
+    "trained_model_managing_account = train_model(managing_account_train_df, managing_account_val_df, 'Sub-issue', rf_classifier, random_state=42)\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 75,
+   "id": "cca27513-501f-4257-a4b1-0e13a3604250",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Accuracy: 0.9841168996188056\n",
+      "\n",
+      "Classification Report:\n",
+      "                                precision    recall  f1-score   support\n",
+      "\n",
+      "              Credit reporting       0.99      1.00      0.99      1500\n",
+      "Other personal consumer report       0.93      0.72      0.81        74\n",
+      "\n",
+      "                      accuracy                           0.98      1574\n",
+      "                     macro avg       0.96      0.86      0.90      1574\n",
+      "                  weighted avg       0.98      0.98      0.98      1574\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Save the trained model to a file\n",
+    "with open('models/Managing_account_model.pkl', 'wb') as f:\n",
+    "    pickle.dump(trained_model_managing_account, f)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 76,
+   "id": "3cbb9aa5-6c0c-4b59-a181-7431e8fc60fc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with open('models/Credit_Reporting_model.pkl', 'wb') as f:\n",
+    "    pickle.dump(trained_model_cr, f)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "93af6b14-f33a-479b-8b6a-79d6621309ed",
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true
+   },
+   "source": [
+    "#### Attempts to collect debt not owed"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 32,
+   "id": "384692cb-09ee-453e-910e-5179f3a33b9d",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>Unnamed: 0</th>\n",
+       "      <th>Consumer complaint narrative</th>\n",
+       "      <th>Issue</th>\n",
+       "      <th>Sub-issue</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>74601</td>\n",
+       "      <td>I had a mobile number with XXXX XXXX for sever...</td>\n",
+       "      <td>Attempts to collect debt not owed</td>\n",
+       "      <td>Debt is not yours</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>126394</td>\n",
+       "      <td>When running my credit report I notice a few c...</td>\n",
+       "      <td>Attempts to collect debt not owed</td>\n",
+       "      <td>Debt was result of identity theft</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>145518</td>\n",
+       "      <td>In early XXXX, XXXX I received notice via the ...</td>\n",
+       "      <td>Attempts to collect debt not owed</td>\n",
+       "      <td>Debt is not yours</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>59439</td>\n",
+       "      <td>I was sent via U.S. mail a debt collection not...</td>\n",
+       "      <td>Attempts to collect debt not owed</td>\n",
+       "      <td>Debt was result of identity theft</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>116810</td>\n",
+       "      <td>This debt collector engaged in abusive, decept...</td>\n",
+       "      <td>Attempts to collect debt not owed</td>\n",
+       "      <td>Debt is not yours</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "   Unnamed: 0                       Consumer complaint narrative  \\\n",
+       "0       74601  I had a mobile number with XXXX XXXX for sever...   \n",
+       "1      126394  When running my credit report I notice a few c...   \n",
+       "2      145518  In early XXXX, XXXX I received notice via the ...   \n",
+       "3       59439  I was sent via U.S. mail a debt collection not...   \n",
+       "4      116810  This debt collector engaged in abusive, decept...   \n",
+       "\n",
+       "                               Issue                          Sub-issue  \n",
+       "0  Attempts to collect debt not owed                  Debt is not yours  \n",
+       "1  Attempts to collect debt not owed  Debt was result of identity theft  \n",
+       "2  Attempts to collect debt not owed                  Debt is not yours  \n",
+       "3  Attempts to collect debt not owed  Debt was result of identity theft  \n",
+       "4  Attempts to collect debt not owed                  Debt is not yours  "
+      ]
+     },
+     "execution_count": 32,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "debt_collection_train_df, debt_collection_val_df = read_subissue_data(\"Attempts to collect debt not owed\")\n",
+    "\n",
+    "# Displaying the first few rows of the training data\n",
+    "debt_collection_train_df.head()\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 33,
+   "id": "1cd16300-096b-43f2-aa9c-9500fbcdd0bd",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Accuracy: 0.7009803921568627\n",
+      "\n",
+      "Classification Report:\n",
+      "                                   precision    recall  f1-score   support\n",
+      "\n",
+      "                Debt is not yours       0.64      0.93      0.76       207\n",
+      "                    Debt was paid       0.96      0.31      0.46        72\n",
+      "Debt was result of identity theft       0.84      0.56      0.67       129\n",
+      "\n",
+      "                         accuracy                           0.70       408\n",
+      "                        macro avg       0.81      0.60      0.63       408\n",
+      "                     weighted avg       0.76      0.70      0.68       408\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Initialize the RandomForestClassifier\n",
+    "rf_classifier = RandomForestClassifier(n_estimators=200, random_state=42)\n",
+    "\n",
+    "# Train the model using the updated training and validation datasets\n",
+    "trained_model_debt_collection = train_model(debt_collection_train_df, debt_collection_val_df, 'Sub-issue', rf_classifier, random_state=42)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9967357b-a3ec-44da-9dfb-a2034a673e8d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with open('models/Debt_collection_model.pkl', 'wb') as f:\n",
+    "    pickle.dump(trained_model_debt_collection, f)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "00777cb3-8df2-4b27-8978-eeb008042f0f",
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true
+   },
+   "source": [
+    "#### Problem with a purchase shown on your statement"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 34,
+   "id": "9400ec96-05e7-4458-bc19-0ef544709004",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Accuracy: 0.7479338842975206\n",
+      "\n",
+      "Classification Report:\n",
+      "                                                                                  precision    recall  f1-score   support\n",
+      "\n",
+      "               Card was charged for something you did not purchase with the card       0.76      0.19      0.30        70\n",
+      "Credit card company isn't resolving a dispute about a purchase on your statement       0.75      0.98      0.85       172\n",
+      "\n",
+      "                                                                        accuracy                           0.75       242\n",
+      "                                                                       macro avg       0.76      0.58      0.57       242\n",
+      "                                                                    weighted avg       0.75      0.75      0.69       242\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Update the issue name in the function call to read_subissue_data\n",
+    "purchase_problem_train_df, purchase_problem_val_df = read_subissue_data(\"Problem with a purchase shown on your statement\")\n",
+    "\n",
+    "# Displaying the first few rows of the training data\n",
+    "purchase_problem_train_df.head()\n",
+    "\n",
+    "# Initialize the RandomForestClassifier\n",
+    "rf_classifier = RandomForestClassifier(n_estimators=200, random_state=42)\n",
+    "\n",
+    "# Train the model using the updated training and validation datasets\n",
+    "trained_model_purchase_problem = train_model(purchase_problem_train_df, purchase_problem_val_df, 'Sub-issue', rf_classifier, random_state=42)\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "446b56ef-54c9-4975-a4ab-4982bf2585b8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Save the trained model to a file\n",
+    "with open('models/Purchase_problem_model.pkl', 'wb') as f:\n",
+    "    pickle.dump(trained_model_purchase_problem, f)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "25526885-aabf-4257-b5c9-4e1c5133a96a",
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true
+   },
+   "source": [
+    "#### Account Operations and Unauthorized Transaction Issues"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 35,
+   "id": "35916c05-e001-462c-91a2-aded09da6e6c",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Accuracy: 0.8586956521739131\n",
+      "\n",
+      "Classification Report:\n",
+      "                                                  precision    recall  f1-score   support\n",
+      "\n",
+      "             Account opened as a result of fraud       0.83      0.67      0.74        43\n",
+      "Card opened as result of identity theft or fraud       0.88      0.77      0.82        39\n",
+      "                  Transaction was not authorized       0.86      0.97      0.91       102\n",
+      "\n",
+      "                                        accuracy                           0.86       184\n",
+      "                                       macro avg       0.86      0.80      0.83       184\n",
+      "                                    weighted avg       0.86      0.86      0.85       184\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Update the issue name in the function call to read_subissue_data\n",
+    "account_operations_train_df, account_operations_val_df = read_subissue_data(\"Account Operations and Unauthorized Transaction Issues\")\n",
+    "\n",
+    "# Displaying the first few rows of the training data\n",
+    "account_operations_train_df.head()\n",
+    "\n",
+    "# Initialize the RandomForestClassifier\n",
+    "rf_classifier = RandomForestClassifier(n_estimators=200, random_state=42)\n",
+    "\n",
+    "# Train the model using the updated training and validation datasets\n",
+    "trained_model_account_operations = train_model(account_operations_train_df, account_operations_val_df, 'Sub-issue', rf_classifier, random_state=42)\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7b35fb47-ad1f-44e7-a952-c8e75118080f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Save the trained model to a file\n",
+    "with open('models/Account_operations_model.pkl', 'wb') as f:\n",
+    "    pickle.dump(trained_model_account_operations, f)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "913129c1-9e06-407a-bc4b-1974f9f984bd",
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true
+   },
+   "source": [
+    "#### 'Payment and Funds Management'"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 36,
+   "id": "e1575ee1-a8e8-4aa2-ab42-1bf88d2759de",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Accuracy: 0.8728323699421965\n",
+      "\n",
+      "Classification Report:\n",
+      "                                precision    recall  f1-score   support\n",
+      "\n",
+      "               Billing problem       1.00      0.65      0.79        34\n",
+      " Overdrafts and overdraft fees       0.89      0.92      0.91        74\n",
+      "Problem during payment process       0.81      0.94      0.87        65\n",
+      "\n",
+      "                      accuracy                           0.87       173\n",
+      "                     macro avg       0.90      0.83      0.85       173\n",
+      "                  weighted avg       0.88      0.87      0.87       173\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Update the issue name in the function call to read_subissue_data\n",
+    "payment_funds_train_df, payment_funds_val_df = read_subissue_data(\"Payment and Funds Management\")\n",
+    "\n",
+    "# Displaying the first few rows of the training data\n",
+    "payment_funds_train_df.head()\n",
+    "\n",
+    "# Initialize the RandomForestClassifier\n",
+    "rf_classifier = RandomForestClassifier(n_estimators=200, random_state=42)\n",
+    "\n",
+    "# Train the model using the updated training and validation datasets\n",
+    "trained_model_payment_funds = train_model(payment_funds_train_df, payment_funds_val_df, 'Sub-issue', rf_classifier, random_state=42)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fd2b3201-b2d9-4943-af2c-b8813bb5379b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Save the trained model to a file\n",
+    "with open('models/Payment_funds_model.pkl', 'wb') as f:\n",
+    "    pickle.dump(trained_model_payment_funds, f)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "621c0a53-5aca-4d17-bf86-e9b8b98f76e5",
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true
+   },
+   "source": [
+    "#### 'Written notification about debt'"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 37,
+   "id": "ecdaaba3-1882-486e-82ee-ade1c0b83eb1",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Accuracy: 0.7814207650273224\n",
+      "\n",
+      "Classification Report:\n",
+      "                                                  precision    recall  f1-score   support\n",
+      "\n",
+      "Didn't receive enough information to verify debt       0.77      0.99      0.87       135\n",
+      "       Didn't receive notice of right to dispute       0.90      0.19      0.31        48\n",
+      "\n",
+      "                                        accuracy                           0.78       183\n",
+      "                                       macro avg       0.84      0.59      0.59       183\n",
+      "                                    weighted avg       0.81      0.78      0.72       183\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Update the issue name in the function call to read_subissue_data\n",
+    "notification_debt_train_df, notification_debt_val_df = read_subissue_data(\"Written notification about debt\")\n",
+    "\n",
+    "# Displaying the first few rows of the training data\n",
+    "notification_debt_train_df.head()\n",
+    "\n",
+    "# Initialize the RandomForestClassifier\n",
+    "rf_classifier = RandomForestClassifier(n_estimators=200, random_state=42)\n",
+    "\n",
+    "# Train the model using the updated training and validation datasets\n",
+    "trained_model_notification_debt = train_model(notification_debt_train_df, notification_debt_val_df, 'Sub-issue', rf_classifier, random_state=42)\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "68697a56-7bdf-4fbd-9d1d-e6c4dbcc7c74",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Save the trained model to a file\n",
+    "with open('models/Notification_debt_model.pkl', 'wb') as f:\n",
+    "    pickle.dump(trained_model_notification_debt, f)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c31597eb-7601-4cfe-a779-f7de38e7e8cc",
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true
+   },
+   "source": [
+    "#### 'Dealing with your lender or servicer':"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 38,
+   "id": "36511f84-069e-4d71-9089-a454f2707467",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Accuracy: 0.7244897959183674\n",
+      "\n",
+      "Classification Report:\n",
+      "                                             precision    recall  f1-score   support\n",
+      "\n",
+      "   Received bad information about your loan       0.74      0.70      0.72        50\n",
+      "Trouble with how payments are being handled       0.71      0.75      0.73        48\n",
+      "\n",
+      "                                   accuracy                           0.72        98\n",
+      "                                  macro avg       0.73      0.72      0.72        98\n",
+      "                               weighted avg       0.73      0.72      0.72        98\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Update the issue name in the function call to read_subissue_data\n",
+    "lender_servicer_train_df, lender_servicer_val_df = read_subissue_data(\"Dealing with your lender or servicer\")\n",
+    "\n",
+    "# Displaying the first few rows of the training data\n",
+    "lender_servicer_train_df.head()\n",
+    "\n",
+    "# Initialize the RandomForestClassifier\n",
+    "rf_classifier = RandomForestClassifier(n_estimators=200, random_state=42)\n",
+    "\n",
+    "# Train the model using the updated training and validation datasets\n",
+    "trained_model_lender_servicer = train_model(lender_servicer_train_df, lender_servicer_val_df, 'Sub-issue', rf_classifier, random_state=42)\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9aee0547-5d02-4ff1-ba8d-858ddd6590a6",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "# Save the trained model to a file\n",
+    "with open('models/Lender_servicer_model.pkl', 'wb') as f:\n",
+    "    pickle.dump(trained_model_lender_servicer, f)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ac9b6231-dff8-4490-a022-ac1519b77405",
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true
+   },
+   "source": [
+    "#### 'Disputes and Misrepresentations'"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 39,
+   "id": "d60d9dd5-b1e7-44b9-9ad2-dd5ae5e4060f",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Accuracy: 0.8418079096045198\n",
+      "\n",
+      "Classification Report:\n",
+      "                                   precision    recall  f1-score   support\n",
+      "\n",
+      "Attempted to collect wrong amount       0.85      0.92      0.88        66\n",
+      "                    Other problem       0.85      0.65      0.74        54\n",
+      "                Problem with fees       0.83      0.93      0.88        57\n",
+      "\n",
+      "                         accuracy                           0.84       177\n",
+      "                        macro avg       0.84      0.83      0.83       177\n",
+      "                     weighted avg       0.84      0.84      0.84       177\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Update the issue name in the function call to read_subissue_data\n",
+    "disputes_misrepresentations_train_df, disputes_misrepresentations_val_df = read_subissue_data(\"Disputes and Misrepresentations\")\n",
+    "\n",
+    "# Displaying the first few rows of the training data\n",
+    "disputes_misrepresentations_train_df.head()\n",
+    "\n",
+    "# Initialize the RandomForestClassifier\n",
+    "rf_classifier = RandomForestClassifier(n_estimators=200, random_state=42)\n",
+    "\n",
+    "# Train the model using the updated training and validation datasets\n",
+    "trained_model_disputes_misrepresentations = train_model(disputes_misrepresentations_train_df, disputes_misrepresentations_val_df, 'Sub-issue', rf_classifier, random_state=42)\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8bc31a9a-2725-46cb-ad25-1e60721dc0b0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "# Save the trained model to a file\n",
+    "with open('models/Disputes_misrepresentations_model.pkl', 'wb') as f:\n",
+    "    pickle.dump(trained_model_disputes_misrepresentations, f)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "83967347-b3ec-4aad-b87f-b06b8752e184",
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true
+   },
+   "source": [
+    "#### \"Problem with a company's investigation into an existing issue\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 40,
+   "id": "1fe01200-373a-444a-b684-06f6a36eb447",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Accuracy: 0.5882352941176471\n",
+      "\n",
+      "Classification Report:\n",
+      "                                                                                       precision    recall  f1-score   support\n",
+      "\n",
+      "Difficulty submitting a dispute or getting information about a dispute over the phone       0.00      0.00      0.00         3\n",
+      "                                                 Investigation took more than 30 days       1.00      1.00      1.00         3\n",
+      "                                           Problem with personal statement of dispute       0.00      0.00      0.00         2\n",
+      "                              Their investigation did not fix an error on your report       0.50      1.00      0.67         7\n",
+      "                                  Was not notified of investigation status or results       0.00      0.00      0.00         2\n",
+      "\n",
+      "                                                                             accuracy                           0.59        17\n",
+      "                                                                            macro avg       0.30      0.40      0.33        17\n",
+      "                                                                         weighted avg       0.38      0.59      0.45        17\n",
+      "\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/Users/shivanimundle/opt/anaconda3/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.\n",
+      "  _warn_prf(average, modifier, msg_start, len(result))\n",
+      "/Users/shivanimundle/opt/anaconda3/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.\n",
+      "  _warn_prf(average, modifier, msg_start, len(result))\n",
+      "/Users/shivanimundle/opt/anaconda3/lib/python3.9/site-packages/sklearn/metrics/_classification.py:1344: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.\n",
+      "  _warn_prf(average, modifier, msg_start, len(result))\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Update the issue name in the function call to read_subissue_data\n",
+    "investigation_issue_train_df, investigation_issue_val_df = read_subissue_data(\"Problem with a company's investigation into an existing issue\")\n",
+    "\n",
+    "# Displaying the first few rows of the training data\n",
+    "investigation_issue_train_df.head()\n",
+    "\n",
+    "# Initialize the RandomForestClassifier\n",
+    "rf_classifier = RandomForestClassifier(n_estimators=200, random_state=42)\n",
+    "\n",
+    "# Train the model using the updated training and validation datasets\n",
+    "trained_model_investigation_issue = train_model(investigation_issue_train_df, investigation_issue_val_df, 'Sub-issue', rf_classifier, random_state=42)\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f7541d40-be19-4570-8863-11329cdcd6a2",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Save the trained model to a file\n",
+    "with open('models/Investigation_issue_model.pkl', 'wb') as f:\n",
+    "    pickle.dump(trained_model_investigation_issue, f)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e4a5e9a1-1e04-4e6f-888b-3cb417d8a89f",
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true
+   },
+   "source": [
+    "#### 'Closing your account'"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 41,
+   "id": "f1c81af1-7378-4d35-923b-1cdfb3e16b47",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Accuracy: 0.7936507936507936\n",
+      "\n",
+      "Classification Report:\n",
+      "                             precision    recall  f1-score   support\n",
+      "\n",
+      "   Can't close your account       1.00      0.24      0.38        17\n",
+      "Company closed your account       0.78      1.00      0.88        46\n",
+      "\n",
+      "                   accuracy                           0.79        63\n",
+      "                  macro avg       0.89      0.62      0.63        63\n",
+      "               weighted avg       0.84      0.79      0.74        63\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Update the issue name in the function call to read_subissue_data\n",
+    "closing_account_train_df, closing_account_val_df = read_subissue_data(\"Closing your account\")\n",
+    "\n",
+    "# Displaying the first few rows of the training data\n",
+    "closing_account_train_df.head()\n",
+    "\n",
+    "# Initialize the RandomForestClassifier\n",
+    "rf_classifier = RandomForestClassifier(n_estimators=200, random_state=42)\n",
+    "\n",
+    "# Train the model using the updated training and validation datasets\n",
+    "trained_model_closing_account = train_model(closing_account_train_df, closing_account_val_df, 'Sub-issue', rf_classifier, random_state=42)\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "da02d848-8a33-4694-a1e8-51cd16904374",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "# Save the trained model to a file\n",
+    "with open('models/Closing_account_model.pkl', 'wb') as f:\n",
+    "    pickle.dump(trained_model_closing_account, f)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bf8e194c-18d3-4958-8a95-ace85b32bf0d",
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true
+   },
+   "source": [
+    "#### 'Credit Report and Monitoring Issues'"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 42,
+   "id": "798c24ec-678c-48e5-a763-641f0f6b4da1",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Accuracy: 0.9098360655737705\n",
+      "\n",
+      "Classification Report:\n",
+      "                                                                          precision    recall  f1-score   support\n",
+      "\n",
+      "                       Other problem getting your report or credit score       0.89      0.99      0.94        82\n",
+      "Problem canceling credit monitoring or identify theft protection service       0.97      0.75      0.85        40\n",
+      "\n",
+      "                                                                accuracy                           0.91       122\n",
+      "                                                               macro avg       0.93      0.87      0.89       122\n",
+      "                                                            weighted avg       0.92      0.91      0.91       122\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Update the issue name in the function call to read_subissue_data\n",
+    "credit_report_train_df, credit_report_val_df = read_subissue_data(\"Credit Report and Monitoring Issues\")\n",
+    "\n",
+    "# Displaying the first few rows of the training data\n",
+    "credit_report_train_df.head()\n",
+    "\n",
+    "# Initialize the RandomForestClassifier\n",
+    "rf_classifier = RandomForestClassifier(n_estimators=200, random_state=42)\n",
+    "\n",
+    "# Train the model using the updated training and validation datasets\n",
+    "trained_model_credit_report = train_model(credit_report_train_df, credit_report_val_df, 'Sub-issue', rf_classifier, random_state=42)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2e49e772-1351-4c2c-905a-0f77b6169268",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "# Save the trained model to a file\n",
+    "with open('models/Credit_report_model.pkl', 'wb') as f:\n",
+    "    pickle.dump(trained_model_credit_report, f)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d4384e07-0b29-4239-9404-cceaeece2a7c",
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true
+   },
+   "source": [
+    "#### 'Closing an account':"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 43,
+   "id": "d7270a5a-4e07-4841-8f1a-600f01940f98",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Accuracy: 0.5684931506849316\n",
+      "\n",
+      "Classification Report:\n",
+      "                                        precision    recall  f1-score   support\n",
+      "\n",
+      "              Can't close your account       1.00      0.04      0.07        27\n",
+      "           Company closed your account       0.57      0.83      0.67        69\n",
+      "Funds not received from closed account       0.56      0.50      0.53        50\n",
+      "\n",
+      "                              accuracy                           0.57       146\n",
+      "                             macro avg       0.71      0.45      0.42       146\n",
+      "                          weighted avg       0.64      0.57      0.51       146\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Update the issue name in the function call to read_subissue_data\n",
+    "closing_account_train_df, closing_account_val_df = read_subissue_data(\"Closing an account\")\n",
+    "\n",
+    "# Displaying the first few rows of the training data\n",
+    "closing_account_train_df.head()\n",
+    "\n",
+    "# Initialize the RandomForestClassifier\n",
+    "rf_classifier = RandomForestClassifier(n_estimators=200, random_state=42)\n",
+    "\n",
+    "# Train the model using the updated training and validation datasets\n",
+    "trained_model_closing_account = train_model(closing_account_train_df, closing_account_val_df, 'Sub-issue', rf_classifier, random_state=42)\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "79c54f47-5fdd-4db4-a70d-ae7fe3068fdb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "# Save the trained model to a file\n",
+    "with open('models/Closing_account_model.pkl', 'wb') as f:\n",
+    "    pickle.dump(trained_model_closing_account, f)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "157b5a71-5b58-4a2a-ae42-b5299660a422",
+   "metadata": {
+    "jp-MarkdownHeadingCollapsed": true
+   },
+   "source": [
+    "#### 'Legal and Threat Actions':"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 44,
+   "id": "8cf7f8ee-c4f1-4b71-901f-74e260e6c700",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Accuracy: 1.0\n",
+      "\n",
+      "Classification Report:\n",
+      "                                                      precision    recall  f1-score   support\n",
+      "\n",
+      "Threatened or suggested your credit would be damaged       1.00      1.00      1.00        48\n",
+      "\n",
+      "                                            accuracy                           1.00        48\n",
+      "                                           macro avg       1.00      1.00      1.00        48\n",
+      "                                        weighted avg       1.00      1.00      1.00        48\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "# Update the issue name in the function call to read_subissue_data\n",
+    "legal_threat_actions_train_df, legal_threat_actions_val_df = read_subissue_data(\"Legal and Threat Actions\")\n",
+    "\n",
+    "# Displaying the first few rows of the training data\n",
+    "legal_threat_actions_train_df.head()\n",
+    "\n",
+    "# Initialize the RandomForestClassifier\n",
+    "rf_classifier = RandomForestClassifier(n_estimators=200, random_state=42)\n",
+    "\n",
+    "# Train the model using the updated training and validation datasets\n",
+    "trained_model_legal_threat_actions = train_model(legal_threat_actions_train_df, legal_threat_actions_val_df, 'Sub-issue', rf_classifier, random_state=42)\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7e1bbe22-ced3-49f9-914e-b9ef713153cc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Save the trained model to a file\n",
+    "with open('models/Legal_threat_actions_model.pkl', 'wb') as f:\n",
+    "    pickle.dump(trained_model_legal_threat_actions, f)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0f7446a2-3e93-46fc-8710-cae1db734297",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.19"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

subproduct_prediction/.ipynb_checkpoints/Sub_Issues-modified-checkpoint.ipynb ADDED Viewed

	@@ -0,0 +1,990 @@

+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "a751d479-1500-41e2-8c01-252e849dad05",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import warnings\n",
+    "warnings.filterwarnings(\"ignore\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "8158cb66-9f9a-4bb2-bc6e-6a51146be10c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt \n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.feature_extraction.text import TfidfVectorizer\n",
+    "from sklearn.pipeline import make_pipeline\n",
+    "from sklearn.linear_model import LogisticRegression\n",
+    "from sklearn.naive_bayes import MultinomialNB\n",
+    "from sklearn.svm import SVC\n",
+    "from sklearn.ensemble import RandomForestClassifier\n",
+    "from sklearn.metrics import classification_report,accuracy_score\n",
+    "import numpy as np\n",
+    "from sklearn.ensemble import RandomForestClassifier\n",
+    "from sklearn.preprocessing import OneHotEncoder\n",
+    "from sklearn.compose import ColumnTransformer\n",
+    "from sklearn.pipeline import Pipeline\n",
+    "from sklearn.pipeline import Pipeline\n",
+    "from sklearn.feature_extraction.text import TfidfVectorizer\n",
+    "from sklearn.ensemble import RandomForestClassifier\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.metrics import classification_report, accuracy_score\n",
+    "from sklearn.utils.class_weight import compute_class_weight\n",
+    "import pickle"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "70ea935b-3b62-4cf9-8bef-06bf30904b20",
+   "metadata": {},
+   "source": [
+    "## Sub Issues"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f9ddaa89-dc8d-40f5-8098-7d108ab9d578",
+   "metadata": {},
+   "source": [
+    "### Model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 29,
+   "id": "c1f9fd85-f47e-4962-a693-7cb9efca763a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sklearn.pipeline import Pipeline\n",
+    "from sklearn.feature_extraction.text import TfidfVectorizer\n",
+    "from sklearn.metrics import accuracy_score, classification_report\n",
+    "from sklearn.utils.class_weight import compute_class_weight\n",
+    "\n",
+    "def train_model(training_df, validation_df, target_column, classifier_model, subissues_to_drop=None, random_state=42):\n",
+    "    # Drop specified subproducts from training and validation dataframes\n",
+    "    if subissues_to_drop:\n",
+    "        training_df = training_df[~training_df[target_column].isin(subissues_to_drop)]\n",
+    "        validation_df = validation_df[~validation_df[target_column].isin(subissues_to_drop)]\n",
+    "    \n",
+    "    # Compute class weights\n",
+    "    class_weights = compute_class_weight('balanced', classes=np.unique(training_df[target_column]), y=training_df[target_column])\n",
+    "    \n",
+    "    # Convert class weights to dictionary format\n",
+    "    class_weight = {label: weight for label, weight in zip(np.unique(training_df[target_column]), class_weights)}\n",
+    "    \n",
+    "    # Define a default class weight for missing classes\n",
+    "    default_class_weight = 0.5\n",
+    "    \n",
+    "    # Assign default class weight for missing classes\n",
+    "    for label in np.unique(training_df[target_column]):\n",
+    "        if label not in class_weight:\n",
+    "            class_weight[label] = default_class_weight\n",
+    "    \n",
+    "    # Define the pipeline\n",
+    "    pipeline = Pipeline([\n",
+    "        ('tfidf', TfidfVectorizer()),\n",
+    "        ('classifier', classifier_model)\n",
+    "    ])\n",
+    "    \n",
+    "    # Train the pipeline\n",
+    "    pipeline.fit(training_df['Consumer complaint narrative'], training_df[target_column])\n",
+    "    \n",
+    "    # Make predictions on the validation set\n",
+    "    y_pred = pipeline.predict(validation_df['Consumer complaint narrative'])\n",
+    "    \n",
+    "    # Evaluate the pipeline\n",
+    "    accuracy = accuracy_score(validation_df[target_column], y_pred)\n",
+    "    print(\"\\nClassification Report:\")\n",
+    "    print(classification_report(validation_df[target_column], y_pred))\n",
+    "    print(\"Accuracy:\", accuracy)\n",
+    "    \n",
+    "    return pipeline"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a7a0d277-75c1-4435-86e5-d0ee7d3dabf3",
+   "metadata": {},
+   "source": [
+    "#### Reading the Issue DataFrame"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 30,
+   "id": "c1ea3fbc-4062-483b-a5c6-65d644983ce5",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import pandas as pd\n",
+    "\n",
+    "def read_subissue_data(issue_name, data_dir='../data_preprocessing_scripts/issue_data_splits'):\n",
+    "    # Convert issue name to lower case and replace '/' and spaces with underscores\n",
+    "    issue_name = issue_name.replace('/', '_').replace(' ', '_').lower()\n",
+    "    \n",
+    "    # Construct file paths\n",
+    "    train_file = os.path.join(data_dir, f\"{issue_name}_train_data.csv\")\n",
+    "    val_file = os.path.join(data_dir, f\"{issue_name}_val_data.csv\")\n",
+    "    \n",
+    "    # Read the CSV files\n",
+    "    train_df = pd.read_csv(train_file)\n",
+    "    val_df = pd.read_csv(val_file )\n",
+    "    \n",
+    "    return train_df, val_df"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 31,
+   "id": "ae74f945-3fe9-4207-8fe0-fb4d8c5d2a27",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df = pd.read_csv(\"../data_splits/train-data-split.csv\")\n",
+    "issue_categories = list(df_train['Issue'].unique())\n",
+    "\n",
+    "def classify_sub_issue(issue):\n",
+    "    issue_name = issue.replace('/', '_').replace(' ', '_').lower()\n",
+    "    train_df,val_df= read_subissue_data(issue)\n",
+    "    rf_classifier = RandomForestClassifier(n_estimators=200, random_state=42)\n",
+    "    trained_model = train_model(train_df, val_df, 'Sub-issue', rf_classifier, random_state=42)\n",
+    "\n",
+    "    # Saving the model\n",
+    "    with open(f\"issue_models/{issue_name}.pkl\", 'wb') as f:\n",
+    "        pickle.dump(trained_model, f)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0540f68f-4e14-40c2-ba9e-1875138678a1",
+   "metadata": {},
+   "source": [
+    "### Sub-issues classification"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7a53f046-c7f8-48de-a8f3-9a66ffad5f55",
+   "metadata": {},
+   "source": [
+    "#### 1. Problem with a company's investigation into an existing problem"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 32,
+   "id": "a33a3974-b3e9-466c-85a9-8d9b0255bbba",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Issue : Problem with a company's investigation into an existing problem\n",
+      "\n",
+      "\n",
+      "Classification Report:\n",
+      "                                                                                       precision    recall  f1-score   support\n",
+      "\n",
+      "Difficulty submitting a dispute or getting information about a dispute over the phone       0.88      0.37      0.52        41\n",
+      "                                                 Investigation took more than 30 days       0.95      0.73      0.83       162\n",
+      "                                           Problem with personal statement of dispute       0.90      0.53      0.67        53\n",
+      "                              Their investigation did not fix an error on your report       0.91      1.00      0.95      1122\n",
+      "                                  Was not notified of investigation status or results       0.98      0.87      0.92       209\n",
+      "\n",
+      "                                                                             accuracy                           0.92      1587\n",
+      "                                                                            macro avg       0.93      0.70      0.78      1587\n",
+      "                                                                         weighted avg       0.92      0.92      0.91      1587\n",
+      "\n",
+      "Accuracy: 0.9199747952110902\n"
+     ]
+    }
+   ],
+   "source": [
+    "issue_name = issue_categories[0]\n",
+    "print(f\"Issue : {issue_name}\\n\")\n",
+    "\n",
+    "classify_sub_issue(issue_name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4ffa280b-614f-48b2-9870-70fb053b45b6",
+   "metadata": {},
+   "source": [
+    "#### 2. Incorrect information on your report"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 34,
+   "id": "3d431635-227e-4873-b017-8cb4180a6e2e",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Issue : Incorrect information on your report\n",
+      "\n",
+      "\n",
+      "Classification Report:\n",
+      "                                                     precision    recall  f1-score   support\n",
+      "\n",
+      "                      Account information incorrect       0.74      0.68      0.71       699\n",
+      "                           Account status incorrect       0.87      0.73      0.79       771\n",
+      "                Information belongs to someone else       0.90      0.99      0.94      4337\n",
+      "Information is missing that should be on the report       0.95      0.31      0.47        65\n",
+      "       Old information reappears or never goes away       0.93      0.40      0.56       126\n",
+      "                     Personal information incorrect       0.95      0.78      0.86       440\n",
+      "               Public record information inaccurate       0.98      0.47      0.64       102\n",
+      "\n",
+      "                                           accuracy                           0.88      6540\n",
+      "                                          macro avg       0.90      0.62      0.71      6540\n",
+      "                                       weighted avg       0.88      0.88      0.88      6540\n",
+      "\n",
+      "Accuracy: 0.8831804281345565\n"
+     ]
+    }
+   ],
+   "source": [
+    "issue_name = issue_categories[1]\n",
+    "print(f\"Issue : {issue_name}\\n\")\n",
+    "\n",
+    "classify_sub_issue(issue_name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f5cb1853-9bc1-4541-9dac-5cb208abcfc5",
+   "metadata": {},
+   "source": [
+    "#### 3. Problem with a credit reporting company's investigation into an existing problem"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 35,
+   "id": "86f04fd6-7625-4aba-9094-f7025078d1fc",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Issue : Problem with a credit reporting company's investigation into an existing problem\n",
+      "\n",
+      "\n",
+      "Classification Report:\n",
+      "                                                                                       precision    recall  f1-score   support\n",
+      "\n",
+      "Difficulty submitting a dispute or getting information about a dispute over the phone       0.83      0.36      0.50        83\n",
+      "                                                 Investigation took more than 30 days       0.97      0.84      0.90       505\n",
+      "                                           Problem with personal statement of dispute       1.00      0.38      0.55        47\n",
+      "                              Their investigation did not fix an error on your report       0.92      0.99      0.95      2277\n",
+      "                                  Was not notified of investigation status or results       0.96      0.88      0.92       473\n",
+      "\n",
+      "                                                                             accuracy                           0.93      3385\n",
+      "                                                                            macro avg       0.94      0.69      0.77      3385\n",
+      "                                                                         weighted avg       0.93      0.93      0.92      3385\n",
+      "\n",
+      "Accuracy: 0.9288035450516987\n"
+     ]
+    }
+   ],
+   "source": [
+    "issue_name = issue_categories[2]\n",
+    "print(f\"Issue : {issue_name}\\n\")\n",
+    "\n",
+    "classify_sub_issue(issue_name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f00b115b-46c4-4d46-adae-a10a5e92a839",
+   "metadata": {},
+   "source": [
+    "#### 4. Problem with a purchase shown on your statement"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 36,
+   "id": "e6577c57-6caa-4221-a68b-e0b65e739511",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Issue : Problem with a purchase shown on your statement\n",
+      "\n",
+      "\n",
+      "Classification Report:\n",
+      "                                                                                  precision    recall  f1-score   support\n",
+      "\n",
+      "               Card was charged for something you did not purchase with the card       0.81      0.19      0.30        70\n",
+      "Credit card company isn't resolving a dispute about a purchase on your statement       0.75      0.98      0.85       172\n",
+      "\n",
+      "                                                                        accuracy                           0.75       242\n",
+      "                                                                       macro avg       0.78      0.58      0.58       242\n",
+      "                                                                    weighted avg       0.77      0.75      0.69       242\n",
+      "\n",
+      "Accuracy: 0.7520661157024794\n"
+     ]
+    }
+   ],
+   "source": [
+    "issue_name = issue_categories[3]\n",
+    "print(f\"Issue : {issue_name}\\n\")\n",
+    "\n",
+    "classify_sub_issue(issue_name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a8648f75-e62d-4b80-b4ed-ccf104137c74",
+   "metadata": {},
+   "source": [
+    "#### 5. Improper use of your report"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 37,
+   "id": "ea64cabb-1372-4a52-826f-8b1bf8f2cb32",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Issue : Improper use of your report\n",
+      "\n",
+      "\n",
+      "Classification Report:\n",
+      "                                                          precision    recall  f1-score   support\n",
+      "\n",
+      "Credit inquiries on your report that you don't recognize       0.93      0.84      0.88       990\n",
+      "           Reporting company used your report improperly       0.96      0.98      0.97      3654\n",
+      "\n",
+      "                                                accuracy                           0.95      4644\n",
+      "                                               macro avg       0.95      0.91      0.93      4644\n",
+      "                                            weighted avg       0.95      0.95      0.95      4644\n",
+      "\n",
+      "Accuracy: 0.9528423772609819\n"
+     ]
+    }
+   ],
+   "source": [
+    "issue_name = issue_categories[4]\n",
+    "print(f\"Issue : {issue_name}\\n\")\n",
+    "\n",
+    "classify_sub_issue(issue_name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f48f3308-d884-440c-8a24-8a81e7140ee0",
+   "metadata": {},
+   "source": [
+    "#### 6. Account Operations and Unauthorized Transaction Issues"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 38,
+   "id": "08ec2d0e-950e-4f6d-9cdb-8328fed17384",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Issue : Account Operations and Unauthorized Transaction Issues\n",
+      "\n",
+      "\n",
+      "Classification Report:\n",
+      "                                                  precision    recall  f1-score   support\n",
+      "\n",
+      "             Account opened as a result of fraud       0.83      0.67      0.74        43\n",
+      "Card opened as result of identity theft or fraud       0.88      0.77      0.82        39\n",
+      "                  Transaction was not authorized       0.86      0.97      0.91       102\n",
+      "\n",
+      "                                        accuracy                           0.86       184\n",
+      "                                       macro avg       0.86      0.80      0.83       184\n",
+      "                                    weighted avg       0.86      0.86      0.85       184\n",
+      "\n",
+      "Accuracy: 0.8586956521739131\n"
+     ]
+    }
+   ],
+   "source": [
+    "issue_name = issue_categories[5]\n",
+    "print(f\"Issue : {issue_name}\\n\")\n",
+    "\n",
+    "classify_sub_issue(issue_name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7c7332c0-3cc9-42b6-9bbd-5b33719e676d",
+   "metadata": {},
+   "source": [
+    "#### 7. Payment and Funds Management"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 39,
+   "id": "bf0e0437-a85d-4dcd-8b93-982fbd33cee6",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Issue : Payment and Funds Management\n",
+      "\n",
+      "\n",
+      "Classification Report:\n",
+      "                                precision    recall  f1-score   support\n",
+      "\n",
+      "               Billing problem       1.00      0.65      0.79        34\n",
+      " Overdrafts and overdraft fees       0.89      0.92      0.91        74\n",
+      "Problem during payment process       0.81      0.94      0.87        65\n",
+      "\n",
+      "                      accuracy                           0.87       173\n",
+      "                     macro avg       0.90      0.83      0.85       173\n",
+      "                  weighted avg       0.88      0.87      0.87       173\n",
+      "\n",
+      "Accuracy: 0.8728323699421965\n"
+     ]
+    }
+   ],
+   "source": [
+    "issue_name = issue_categories[6]\n",
+    "print(f\"Issue : {issue_name}\\n\")\n",
+    "\n",
+    "classify_sub_issue(issue_name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b034a174-16e7-41b6-970c-ef23d9b9da29",
+   "metadata": {},
+   "source": [
+    "#### 8. Managing an account"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 40,
+   "id": "bc62e5f5-14ef-4d8a-8434-79b4e7da5a9a",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Issue : Managing an account\n",
+      "\n",
+      "\n",
+      "Classification Report:\n",
+      "                                              precision    recall  f1-score   support\n",
+      "\n",
+      "                              Banking errors       0.50      0.10      0.16        73\n",
+      "                    Deposits and withdrawals       0.46      0.90      0.61       201\n",
+      "                                 Fee problem       0.55      0.57      0.56        56\n",
+      "Funds not handled or disbursed as instructed       0.00      0.00      0.00        72\n",
+      "                   Problem accessing account       0.00      0.00      0.00        40\n",
+      "           Problem using a debit or ATM card       0.71      0.58      0.64       113\n",
+      "\n",
+      "                                    accuracy                           0.52       555\n",
+      "                                   macro avg       0.37      0.36      0.33       555\n",
+      "                                weighted avg       0.43      0.52      0.43       555\n",
+      "\n",
+      "Accuracy: 0.5153153153153153\n"
+     ]
+    }
+   ],
+   "source": [
+    "issue_name = issue_categories[7]\n",
+    "print(f\"Issue : {issue_name}\\n\")\n",
+    "\n",
+    "classify_sub_issue(issue_name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6c2e3454-eaa2-4a71-a058-988ad7716eac",
+   "metadata": {},
+   "source": [
+    "#### 9. Attempts to collect debt not owed"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 41,
+   "id": "85ad1ffc-97e5-436b-afea-abed93b67b75",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Issue : Attempts to collect debt not owed\n",
+      "\n",
+      "\n",
+      "Classification Report:\n",
+      "                                   precision    recall  f1-score   support\n",
+      "\n",
+      "                Debt is not yours       0.64      0.93      0.76       207\n",
+      "                    Debt was paid       0.96      0.31      0.46        72\n",
+      "Debt was result of identity theft       0.84      0.56      0.67       129\n",
+      "\n",
+      "                         accuracy                           0.70       408\n",
+      "                        macro avg       0.81      0.60      0.63       408\n",
+      "                     weighted avg       0.76      0.70      0.68       408\n",
+      "\n",
+      "Accuracy: 0.7009803921568627\n"
+     ]
+    }
+   ],
+   "source": [
+    "issue_name = issue_categories[8]\n",
+    "print(f\"Issue : {issue_name}\\n\")\n",
+    "\n",
+    "classify_sub_issue(issue_name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "43b186f0-b626-43c2-9823-6818da478d48",
+   "metadata": {},
+   "source": [
+    "-----"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8d87e677-da08-4682-9823-72c8315e52a2",
+   "metadata": {},
+   "source": [
+    "#### 10. Written notification about debt"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 42,
+   "id": "214fc01d-7bf1-4b5a-b409-10b3c99076ae",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Issue : Written notification about debt\n",
+      "\n",
+      "\n",
+      "Classification Report:\n",
+      "                                                  precision    recall  f1-score   support\n",
+      "\n",
+      "Didn't receive enough information to verify debt       0.77      0.99      0.87       135\n",
+      "       Didn't receive notice of right to dispute       0.90      0.19      0.31        48\n",
+      "\n",
+      "                                        accuracy                           0.78       183\n",
+      "                                       macro avg       0.84      0.59      0.59       183\n",
+      "                                    weighted avg       0.81      0.78      0.72       183\n",
+      "\n",
+      "Accuracy: 0.7814207650273224\n"
+     ]
+    }
+   ],
+   "source": [
+    "issue_name = issue_categories[9]\n",
+    "print(f\"Issue : {issue_name}\\n\")\n",
+    "\n",
+    "classify_sub_issue(issue_name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7cca2ba7-f0e1-4e56-a6f0-2a3c92bcac56",
+   "metadata": {},
+   "source": [
+    "----"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "401e87db-4759-437c-bcb1-382a7f8ed226",
+   "metadata": {},
+   "source": [
+    "#### 11. Dealing with your lender or servicer"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 43,
+   "id": "9c1485fc-1b14-44c9-b4c9-d92bea864800",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Issue : Dealing with your lender or servicer\n",
+      "\n",
+      "\n",
+      "Classification Report:\n",
+      "                                             precision    recall  f1-score   support\n",
+      "\n",
+      "   Received bad information about your loan       0.74      0.70      0.72        50\n",
+      "Trouble with how payments are being handled       0.71      0.75      0.73        48\n",
+      "\n",
+      "                                   accuracy                           0.72        98\n",
+      "                                  macro avg       0.73      0.72      0.72        98\n",
+      "                               weighted avg       0.73      0.72      0.72        98\n",
+      "\n",
+      "Accuracy: 0.7244897959183674\n"
+     ]
+    }
+   ],
+   "source": [
+    "issue_name = issue_categories[10]\n",
+    "print(f\"Issue : {issue_name}\\n\")\n",
+    "\n",
+    "classify_sub_issue(issue_name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8ca1aab7-158f-48bf-871c-1fa991fb1f9e",
+   "metadata": {},
+   "source": [
+    "----"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "36ce1724-61e5-4d5b-bbaf-a79293af6506",
+   "metadata": {},
+   "source": [
+    "#### 12. Disputes and Misrepresentations"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 44,
+   "id": "380ee173-6c72-40b8-9eb2-a5af680c8ff7",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Issue : Disputes and Misrepresentations\n",
+      "\n",
+      "\n",
+      "Classification Report:\n",
+      "                                   precision    recall  f1-score   support\n",
+      "\n",
+      "Attempted to collect wrong amount       0.85      0.92      0.88        66\n",
+      "                    Other problem       0.85      0.65      0.74        54\n",
+      "                Problem with fees       0.83      0.93      0.88        57\n",
+      "\n",
+      "                         accuracy                           0.84       177\n",
+      "                        macro avg       0.84      0.83      0.83       177\n",
+      "                     weighted avg       0.84      0.84      0.84       177\n",
+      "\n",
+      "Accuracy: 0.8418079096045198\n"
+     ]
+    }
+   ],
+   "source": [
+    "issue_name = issue_categories[11]\n",
+    "print(f\"Issue : {issue_name}\\n\")\n",
+    "\n",
+    "classify_sub_issue(issue_name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e44501a4-2021-4d78-b3c2-c937d286cb22",
+   "metadata": {},
+   "source": [
+    "----"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "451ccf3a-c97e-46e3-9c47-c225d6e3dd49",
+   "metadata": {},
+   "source": [
+    "#### 13. Problem with a company's investigation into an existing issue"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 45,
+   "id": "20201d0c-b9da-4e2e-957b-23649f06e48e",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Issue : Problem with a company's investigation into an existing issue\n",
+      "\n",
+      "\n",
+      "Classification Report:\n",
+      "                                                                                       precision    recall  f1-score   support\n",
+      "\n",
+      "Difficulty submitting a dispute or getting information about a dispute over the phone       0.00      0.00      0.00         3\n",
+      "                                                 Investigation took more than 30 days       1.00      1.00      1.00         3\n",
+      "                                           Problem with personal statement of dispute       0.00      0.00      0.00         2\n",
+      "                              Their investigation did not fix an error on your report       0.50      1.00      0.67         7\n",
+      "                                  Was not notified of investigation status or results       0.00      0.00      0.00         2\n",
+      "\n",
+      "                                                                             accuracy                           0.59        17\n",
+      "                                                                            macro avg       0.30      0.40      0.33        17\n",
+      "                                                                         weighted avg       0.38      0.59      0.45        17\n",
+      "\n",
+      "Accuracy: 0.5882352941176471\n"
+     ]
+    }
+   ],
+   "source": [
+    "issue_name = issue_categories[12]\n",
+    "print(f\"Issue : {issue_name}\\n\")\n",
+    "\n",
+    "classify_sub_issue(issue_name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c5d37ff8-2382-4c3b-aef0-5affd4d3083b",
+   "metadata": {},
+   "source": [
+    "----"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c9876639-9e72-49ab-9dd4-3ef5ac38a8d8",
+   "metadata": {},
+   "source": [
+    "#### 14. Closing your account"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 46,
+   "id": "95eff365-09f8-4640-9f65-4a82fc321fa9",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Issue : Closing your account\n",
+      "\n",
+      "\n",
+      "Classification Report:\n",
+      "                             precision    recall  f1-score   support\n",
+      "\n",
+      "   Can't close your account       1.00      0.24      0.38        17\n",
+      "Company closed your account       0.78      1.00      0.88        46\n",
+      "\n",
+      "                   accuracy                           0.79        63\n",
+      "                  macro avg       0.89      0.62      0.63        63\n",
+      "               weighted avg       0.84      0.79      0.74        63\n",
+      "\n",
+      "Accuracy: 0.7936507936507936\n"
+     ]
+    }
+   ],
+   "source": [
+    "issue_name = issue_categories[13]\n",
+    "print(f\"Issue : {issue_name}\\n\")\n",
+    "\n",
+    "classify_sub_issue(issue_name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c66b9044-32af-4aee-af08-b685480d9f53",
+   "metadata": {},
+   "source": [
+    "----"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "455f8d69-5531-42e0-a53c-66427ff68fcc",
+   "metadata": {},
+   "source": [
+    "#### 15. Credit Report and Monitoring Issues"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 47,
+   "id": "a039cb86-3503-4757-a8ee-7e518eafb9a5",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Issue : Credit Report and Monitoring Issues\n",
+      "\n",
+      "\n",
+      "Classification Report:\n",
+      "                                                                          precision    recall  f1-score   support\n",
+      "\n",
+      "                       Other problem getting your report or credit score       0.89      0.99      0.94        82\n",
+      "Problem canceling credit monitoring or identify theft protection service       0.97      0.75      0.85        40\n",
+      "\n",
+      "                                                                accuracy                           0.91       122\n",
+      "                                                               macro avg       0.93      0.87      0.89       122\n",
+      "                                                            weighted avg       0.92      0.91      0.91       122\n",
+      "\n",
+      "Accuracy: 0.9098360655737705\n"
+     ]
+    }
+   ],
+   "source": [
+    "issue_name = issue_categories[14]\n",
+    "print(f\"Issue : {issue_name}\\n\")\n",
+    "\n",
+    "classify_sub_issue(issue_name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ee0dfc45-96b2-4cbb-b34d-a8e1441c0c82",
+   "metadata": {},
+   "source": [
+    "----"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0dcf3701-d59f-43fa-9aa0-2c65c27a8fe0",
+   "metadata": {},
+   "source": [
+    "#### 16. Closing an account"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 48,
+   "id": "1ed7956b-3d41-46f8-a7e8-ad9f36e1694d",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Issue : Closing an account\n",
+      "\n",
+      "\n",
+      "Classification Report:\n",
+      "                                        precision    recall  f1-score   support\n",
+      "\n",
+      "              Can't close your account       1.00      0.04      0.07        27\n",
+      "           Company closed your account       0.57      0.83      0.67        69\n",
+      "Funds not received from closed account       0.56      0.50      0.53        50\n",
+      "\n",
+      "                              accuracy                           0.57       146\n",
+      "                             macro avg       0.71      0.45      0.42       146\n",
+      "                          weighted avg       0.64      0.57      0.51       146\n",
+      "\n",
+      "Accuracy: 0.5684931506849316\n"
+     ]
+    }
+   ],
+   "source": [
+    "issue_name = issue_categories[15]\n",
+    "print(f\"Issue : {issue_name}\\n\")\n",
+    "\n",
+    "classify_sub_issue(issue_name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3822541c-f13c-4a96-862f-4c23cf2d3895",
+   "metadata": {},
+   "source": [
+    "#### 17. Legal and Threat Actions"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 49,
+   "id": "8fa5fc40-6d4f-4321-8eb0-9608dc5b84e2",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Issue : Legal and Threat Actions\n",
+      "\n",
+      "\n",
+      "Classification Report:\n",
+      "                                                      precision    recall  f1-score   support\n",
+      "\n",
+      "Threatened or suggested your credit would be damaged       1.00      1.00      1.00        48\n",
+      "\n",
+      "                                            accuracy                           1.00        48\n",
+      "                                           macro avg       1.00      1.00      1.00        48\n",
+      "                                        weighted avg       1.00      1.00      1.00        48\n",
+      "\n",
+      "Accuracy: 1.0\n"
+     ]
+    }
+   ],
+   "source": [
+    "issue_name = issue_categories[16]\n",
+    "print(f\"Issue : {issue_name}\\n\")\n",
+    "\n",
+    "classify_sub_issue(issue_name)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.19"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

subproduct_prediction/Pipeline.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

subproduct_prediction/Sub_Issue.ipynb ADDED Viewed

	@@ -0,0 +1,990 @@

+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "a751d479-1500-41e2-8c01-252e849dad05",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import warnings\n",
+    "warnings.filterwarnings(\"ignore\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "8158cb66-9f9a-4bb2-bc6e-6a51146be10c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt \n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.feature_extraction.text import TfidfVectorizer\n",
+    "from sklearn.pipeline import make_pipeline\n",
+    "from sklearn.linear_model import LogisticRegression\n",
+    "from sklearn.naive_bayes import MultinomialNB\n",
+    "from sklearn.svm import SVC\n",
+    "from sklearn.ensemble import RandomForestClassifier\n",
+    "from sklearn.metrics import classification_report,accuracy_score\n",
+    "import numpy as np\n",
+    "from sklearn.ensemble import RandomForestClassifier\n",
+    "from sklearn.preprocessing import OneHotEncoder\n",
+    "from sklearn.compose import ColumnTransformer\n",
+    "from sklearn.pipeline import Pipeline\n",
+    "from sklearn.pipeline import Pipeline\n",
+    "from sklearn.feature_extraction.text import TfidfVectorizer\n",
+    "from sklearn.ensemble import RandomForestClassifier\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.metrics import classification_report, accuracy_score\n",
+    "from sklearn.utils.class_weight import compute_class_weight\n",
+    "import pickle"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "70ea935b-3b62-4cf9-8bef-06bf30904b20",
+   "metadata": {},
+   "source": [
+    "## Sub Issues"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f9ddaa89-dc8d-40f5-8098-7d108ab9d578",
+   "metadata": {},
+   "source": [
+    "### Model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 29,
+   "id": "c1f9fd85-f47e-4962-a693-7cb9efca763a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sklearn.pipeline import Pipeline\n",
+    "from sklearn.feature_extraction.text import TfidfVectorizer\n",
+    "from sklearn.metrics import accuracy_score, classification_report\n",
+    "from sklearn.utils.class_weight import compute_class_weight\n",
+    "\n",
+    "def train_model(training_df, validation_df, target_column, classifier_model, subissues_to_drop=None, random_state=42):\n",
+    "    # Drop specified subproducts from training and validation dataframes\n",
+    "    if subissues_to_drop:\n",
+    "        training_df = training_df[~training_df[target_column].isin(subissues_to_drop)]\n",
+    "        validation_df = validation_df[~validation_df[target_column].isin(subissues_to_drop)]\n",
+    "    \n",
+    "    # Compute class weights\n",
+    "    class_weights = compute_class_weight('balanced', classes=np.unique(training_df[target_column]), y=training_df[target_column])\n",
+    "    \n",
+    "    # Convert class weights to dictionary format\n",
+    "    class_weight = {label: weight for label, weight in zip(np.unique(training_df[target_column]), class_weights)}\n",
+    "    \n",
+    "    # Define a default class weight for missing classes\n",
+    "    default_class_weight = 0.5\n",
+    "    \n",
+    "    # Assign default class weight for missing classes\n",
+    "    for label in np.unique(training_df[target_column]):\n",
+    "        if label not in class_weight:\n",
+    "            class_weight[label] = default_class_weight\n",
+    "    \n",
+    "    # Define the pipeline\n",
+    "    pipeline = Pipeline([\n",
+    "        ('tfidf', TfidfVectorizer()),\n",
+    "        ('classifier', classifier_model)\n",
+    "    ])\n",
+    "    \n",
+    "    # Train the pipeline\n",
+    "    pipeline.fit(training_df['Consumer complaint narrative'], training_df[target_column])\n",
+    "    \n",
+    "    # Make predictions on the validation set\n",
+    "    y_pred = pipeline.predict(validation_df['Consumer complaint narrative'])\n",
+    "    \n",
+    "    # Evaluate the pipeline\n",
+    "    accuracy = accuracy_score(validation_df[target_column], y_pred)\n",
+    "    print(\"\\nClassification Report:\")\n",
+    "    print(classification_report(validation_df[target_column], y_pred))\n",
+    "    print(\"Accuracy:\", accuracy)\n",
+    "    \n",
+    "    return pipeline"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a7a0d277-75c1-4435-86e5-d0ee7d3dabf3",
+   "metadata": {},
+   "source": [
+    "#### Reading the Issue DataFrame"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 30,
+   "id": "c1ea3fbc-4062-483b-a5c6-65d644983ce5",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import pandas as pd\n",
+    "\n",
+    "def read_subissue_data(issue_name, data_dir='../data_preprocessing_scripts/issue_data_splits'):\n",
+    "    # Convert issue name to lower case and replace '/' and spaces with underscores\n",
+    "    issue_name = issue_name.replace('/', '_').replace(' ', '_').lower()\n",
+    "    \n",
+    "    # Construct file paths\n",
+    "    train_file = os.path.join(data_dir, f\"{issue_name}_train_data.csv\")\n",
+    "    val_file = os.path.join(data_dir, f\"{issue_name}_val_data.csv\")\n",
+    "    \n",
+    "    # Read the CSV files\n",
+    "    train_df = pd.read_csv(train_file)\n",
+    "    val_df = pd.read_csv(val_file )\n",
+    "    \n",
+    "    return train_df, val_df"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 31,
+   "id": "ae74f945-3fe9-4207-8fe0-fb4d8c5d2a27",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df = pd.read_csv(\"../data_splits/train-data-split.csv\")\n",
+    "issue_categories = list(df_train['Issue'].unique())\n",
+    "\n",
+    "def classify_sub_issue(issue):\n",
+    "    issue_name = issue.replace('/', '_').replace(' ', '_').lower()\n",
+    "    train_df,val_df= read_subissue_data(issue)\n",
+    "    rf_classifier = RandomForestClassifier(n_estimators=200, random_state=42)\n",
+    "    trained_model = train_model(train_df, val_df, 'Sub-issue', rf_classifier, random_state=42)\n",
+    "\n",
+    "    # Saving the model\n",
+    "    with open(f\"issue_models/{issue_name}.pkl\", 'wb') as f:\n",
+    "        pickle.dump(trained_model, f)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0540f68f-4e14-40c2-ba9e-1875138678a1",
+   "metadata": {},
+   "source": [
+    "### Sub-issues classification"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7a53f046-c7f8-48de-a8f3-9a66ffad5f55",
+   "metadata": {},
+   "source": [
+    "#### 1. Problem with a company's investigation into an existing problem"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 32,
+   "id": "a33a3974-b3e9-466c-85a9-8d9b0255bbba",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Issue : Problem with a company's investigation into an existing problem\n",
+      "\n",
+      "\n",
+      "Classification Report:\n",
+      "                                                                                       precision    recall  f1-score   support\n",
+      "\n",
+      "Difficulty submitting a dispute or getting information about a dispute over the phone       0.88      0.37      0.52        41\n",
+      "                                                 Investigation took more than 30 days       0.95      0.73      0.83       162\n",
+      "                                           Problem with personal statement of dispute       0.90      0.53      0.67        53\n",
+      "                              Their investigation did not fix an error on your report       0.91      1.00      0.95      1122\n",
+      "                                  Was not notified of investigation status or results       0.98      0.87      0.92       209\n",
+      "\n",
+      "                                                                             accuracy                           0.92      1587\n",
+      "                                                                            macro avg       0.93      0.70      0.78      1587\n",
+      "                                                                         weighted avg       0.92      0.92      0.91      1587\n",
+      "\n",
+      "Accuracy: 0.9199747952110902\n"
+     ]
+    }
+   ],
+   "source": [
+    "issue_name = issue_categories[0]\n",
+    "print(f\"Issue : {issue_name}\\n\")\n",
+    "\n",
+    "classify_sub_issue(issue_name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4ffa280b-614f-48b2-9870-70fb053b45b6",
+   "metadata": {},
+   "source": [
+    "#### 2. Incorrect information on your report"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 34,
+   "id": "3d431635-227e-4873-b017-8cb4180a6e2e",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Issue : Incorrect information on your report\n",
+      "\n",
+      "\n",
+      "Classification Report:\n",
+      "                                                     precision    recall  f1-score   support\n",
+      "\n",
+      "                      Account information incorrect       0.74      0.68      0.71       699\n",
+      "                           Account status incorrect       0.87      0.73      0.79       771\n",
+      "                Information belongs to someone else       0.90      0.99      0.94      4337\n",
+      "Information is missing that should be on the report       0.95      0.31      0.47        65\n",
+      "       Old information reappears or never goes away       0.93      0.40      0.56       126\n",
+      "                     Personal information incorrect       0.95      0.78      0.86       440\n",
+      "               Public record information inaccurate       0.98      0.47      0.64       102\n",
+      "\n",
+      "                                           accuracy                           0.88      6540\n",
+      "                                          macro avg       0.90      0.62      0.71      6540\n",
+      "                                       weighted avg       0.88      0.88      0.88      6540\n",
+      "\n",
+      "Accuracy: 0.8831804281345565\n"
+     ]
+    }
+   ],
+   "source": [
+    "issue_name = issue_categories[1]\n",
+    "print(f\"Issue : {issue_name}\\n\")\n",
+    "\n",
+    "classify_sub_issue(issue_name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f5cb1853-9bc1-4541-9dac-5cb208abcfc5",
+   "metadata": {},
+   "source": [
+    "#### 3. Problem with a credit reporting company's investigation into an existing problem"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 35,
+   "id": "86f04fd6-7625-4aba-9094-f7025078d1fc",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Issue : Problem with a credit reporting company's investigation into an existing problem\n",
+      "\n",
+      "\n",
+      "Classification Report:\n",
+      "                                                                                       precision    recall  f1-score   support\n",
+      "\n",
+      "Difficulty submitting a dispute or getting information about a dispute over the phone       0.83      0.36      0.50        83\n",
+      "                                                 Investigation took more than 30 days       0.97      0.84      0.90       505\n",
+      "                                           Problem with personal statement of dispute       1.00      0.38      0.55        47\n",
+      "                              Their investigation did not fix an error on your report       0.92      0.99      0.95      2277\n",
+      "                                  Was not notified of investigation status or results       0.96      0.88      0.92       473\n",
+      "\n",
+      "                                                                             accuracy                           0.93      3385\n",
+      "                                                                            macro avg       0.94      0.69      0.77      3385\n",
+      "                                                                         weighted avg       0.93      0.93      0.92      3385\n",
+      "\n",
+      "Accuracy: 0.9288035450516987\n"
+     ]
+    }
+   ],
+   "source": [
+    "issue_name = issue_categories[2]\n",
+    "print(f\"Issue : {issue_name}\\n\")\n",
+    "\n",
+    "classify_sub_issue(issue_name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f00b115b-46c4-4d46-adae-a10a5e92a839",
+   "metadata": {},
+   "source": [
+    "#### 4. Problem with a purchase shown on your statement"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 36,
+   "id": "e6577c57-6caa-4221-a68b-e0b65e739511",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Issue : Problem with a purchase shown on your statement\n",
+      "\n",
+      "\n",
+      "Classification Report:\n",
+      "                                                                                  precision    recall  f1-score   support\n",
+      "\n",
+      "               Card was charged for something you did not purchase with the card       0.81      0.19      0.30        70\n",
+      "Credit card company isn't resolving a dispute about a purchase on your statement       0.75      0.98      0.85       172\n",
+      "\n",
+      "                                                                        accuracy                           0.75       242\n",
+      "                                                                       macro avg       0.78      0.58      0.58       242\n",
+      "                                                                    weighted avg       0.77      0.75      0.69       242\n",
+      "\n",
+      "Accuracy: 0.7520661157024794\n"
+     ]
+    }
+   ],
+   "source": [
+    "issue_name = issue_categories[3]\n",
+    "print(f\"Issue : {issue_name}\\n\")\n",
+    "\n",
+    "classify_sub_issue(issue_name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a8648f75-e62d-4b80-b4ed-ccf104137c74",
+   "metadata": {},
+   "source": [
+    "#### 5. Improper use of your report"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 37,
+   "id": "ea64cabb-1372-4a52-826f-8b1bf8f2cb32",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Issue : Improper use of your report\n",
+      "\n",
+      "\n",
+      "Classification Report:\n",
+      "                                                          precision    recall  f1-score   support\n",
+      "\n",
+      "Credit inquiries on your report that you don't recognize       0.93      0.84      0.88       990\n",
+      "           Reporting company used your report improperly       0.96      0.98      0.97      3654\n",
+      "\n",
+      "                                                accuracy                           0.95      4644\n",
+      "                                               macro avg       0.95      0.91      0.93      4644\n",
+      "                                            weighted avg       0.95      0.95      0.95      4644\n",
+      "\n",
+      "Accuracy: 0.9528423772609819\n"
+     ]
+    }
+   ],
+   "source": [
+    "issue_name = issue_categories[4]\n",
+    "print(f\"Issue : {issue_name}\\n\")\n",
+    "\n",
+    "classify_sub_issue(issue_name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f48f3308-d884-440c-8a24-8a81e7140ee0",
+   "metadata": {},
+   "source": [
+    "#### 6. Account Operations and Unauthorized Transaction Issues"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 38,
+   "id": "08ec2d0e-950e-4f6d-9cdb-8328fed17384",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Issue : Account Operations and Unauthorized Transaction Issues\n",
+      "\n",
+      "\n",
+      "Classification Report:\n",
+      "                                                  precision    recall  f1-score   support\n",
+      "\n",
+      "             Account opened as a result of fraud       0.83      0.67      0.74        43\n",
+      "Card opened as result of identity theft or fraud       0.88      0.77      0.82        39\n",
+      "                  Transaction was not authorized       0.86      0.97      0.91       102\n",
+      "\n",
+      "                                        accuracy                           0.86       184\n",
+      "                                       macro avg       0.86      0.80      0.83       184\n",
+      "                                    weighted avg       0.86      0.86      0.85       184\n",
+      "\n",
+      "Accuracy: 0.8586956521739131\n"
+     ]
+    }
+   ],
+   "source": [
+    "issue_name = issue_categories[5]\n",
+    "print(f\"Issue : {issue_name}\\n\")\n",
+    "\n",
+    "classify_sub_issue(issue_name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7c7332c0-3cc9-42b6-9bbd-5b33719e676d",
+   "metadata": {},
+   "source": [
+    "#### 7. Payment and Funds Management"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 39,
+   "id": "bf0e0437-a85d-4dcd-8b93-982fbd33cee6",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Issue : Payment and Funds Management\n",
+      "\n",
+      "\n",
+      "Classification Report:\n",
+      "                                precision    recall  f1-score   support\n",
+      "\n",
+      "               Billing problem       1.00      0.65      0.79        34\n",
+      " Overdrafts and overdraft fees       0.89      0.92      0.91        74\n",
+      "Problem during payment process       0.81      0.94      0.87        65\n",
+      "\n",
+      "                      accuracy                           0.87       173\n",
+      "                     macro avg       0.90      0.83      0.85       173\n",
+      "                  weighted avg       0.88      0.87      0.87       173\n",
+      "\n",
+      "Accuracy: 0.8728323699421965\n"
+     ]
+    }
+   ],
+   "source": [
+    "issue_name = issue_categories[6]\n",
+    "print(f\"Issue : {issue_name}\\n\")\n",
+    "\n",
+    "classify_sub_issue(issue_name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "b034a174-16e7-41b6-970c-ef23d9b9da29",
+   "metadata": {},
+   "source": [
+    "#### 8. Managing an account"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 40,
+   "id": "bc62e5f5-14ef-4d8a-8434-79b4e7da5a9a",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Issue : Managing an account\n",
+      "\n",
+      "\n",
+      "Classification Report:\n",
+      "                                              precision    recall  f1-score   support\n",
+      "\n",
+      "                              Banking errors       0.50      0.10      0.16        73\n",
+      "                    Deposits and withdrawals       0.46      0.90      0.61       201\n",
+      "                                 Fee problem       0.55      0.57      0.56        56\n",
+      "Funds not handled or disbursed as instructed       0.00      0.00      0.00        72\n",
+      "                   Problem accessing account       0.00      0.00      0.00        40\n",
+      "           Problem using a debit or ATM card       0.71      0.58      0.64       113\n",
+      "\n",
+      "                                    accuracy                           0.52       555\n",
+      "                                   macro avg       0.37      0.36      0.33       555\n",
+      "                                weighted avg       0.43      0.52      0.43       555\n",
+      "\n",
+      "Accuracy: 0.5153153153153153\n"
+     ]
+    }
+   ],
+   "source": [
+    "issue_name = issue_categories[7]\n",
+    "print(f\"Issue : {issue_name}\\n\")\n",
+    "\n",
+    "classify_sub_issue(issue_name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6c2e3454-eaa2-4a71-a058-988ad7716eac",
+   "metadata": {},
+   "source": [
+    "#### 9. Attempts to collect debt not owed"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 41,
+   "id": "85ad1ffc-97e5-436b-afea-abed93b67b75",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Issue : Attempts to collect debt not owed\n",
+      "\n",
+      "\n",
+      "Classification Report:\n",
+      "                                   precision    recall  f1-score   support\n",
+      "\n",
+      "                Debt is not yours       0.64      0.93      0.76       207\n",
+      "                    Debt was paid       0.96      0.31      0.46        72\n",
+      "Debt was result of identity theft       0.84      0.56      0.67       129\n",
+      "\n",
+      "                         accuracy                           0.70       408\n",
+      "                        macro avg       0.81      0.60      0.63       408\n",
+      "                     weighted avg       0.76      0.70      0.68       408\n",
+      "\n",
+      "Accuracy: 0.7009803921568627\n"
+     ]
+    }
+   ],
+   "source": [
+    "issue_name = issue_categories[8]\n",
+    "print(f\"Issue : {issue_name}\\n\")\n",
+    "\n",
+    "classify_sub_issue(issue_name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "43b186f0-b626-43c2-9823-6818da478d48",
+   "metadata": {},
+   "source": [
+    "-----"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8d87e677-da08-4682-9823-72c8315e52a2",
+   "metadata": {},
+   "source": [
+    "#### 10. Written notification about debt"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 42,
+   "id": "214fc01d-7bf1-4b5a-b409-10b3c99076ae",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Issue : Written notification about debt\n",
+      "\n",
+      "\n",
+      "Classification Report:\n",
+      "                                                  precision    recall  f1-score   support\n",
+      "\n",
+      "Didn't receive enough information to verify debt       0.77      0.99      0.87       135\n",
+      "       Didn't receive notice of right to dispute       0.90      0.19      0.31        48\n",
+      "\n",
+      "                                        accuracy                           0.78       183\n",
+      "                                       macro avg       0.84      0.59      0.59       183\n",
+      "                                    weighted avg       0.81      0.78      0.72       183\n",
+      "\n",
+      "Accuracy: 0.7814207650273224\n"
+     ]
+    }
+   ],
+   "source": [
+    "issue_name = issue_categories[9]\n",
+    "print(f\"Issue : {issue_name}\\n\")\n",
+    "\n",
+    "classify_sub_issue(issue_name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7cca2ba7-f0e1-4e56-a6f0-2a3c92bcac56",
+   "metadata": {},
+   "source": [
+    "----"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "401e87db-4759-437c-bcb1-382a7f8ed226",
+   "metadata": {},
+   "source": [
+    "#### 11. Dealing with your lender or servicer"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 43,
+   "id": "9c1485fc-1b14-44c9-b4c9-d92bea864800",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Issue : Dealing with your lender or servicer\n",
+      "\n",
+      "\n",
+      "Classification Report:\n",
+      "                                             precision    recall  f1-score   support\n",
+      "\n",
+      "   Received bad information about your loan       0.74      0.70      0.72        50\n",
+      "Trouble with how payments are being handled       0.71      0.75      0.73        48\n",
+      "\n",
+      "                                   accuracy                           0.72        98\n",
+      "                                  macro avg       0.73      0.72      0.72        98\n",
+      "                               weighted avg       0.73      0.72      0.72        98\n",
+      "\n",
+      "Accuracy: 0.7244897959183674\n"
+     ]
+    }
+   ],
+   "source": [
+    "issue_name = issue_categories[10]\n",
+    "print(f\"Issue : {issue_name}\\n\")\n",
+    "\n",
+    "classify_sub_issue(issue_name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8ca1aab7-158f-48bf-871c-1fa991fb1f9e",
+   "metadata": {},
+   "source": [
+    "----"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "36ce1724-61e5-4d5b-bbaf-a79293af6506",
+   "metadata": {},
+   "source": [
+    "#### 12. Disputes and Misrepresentations"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 44,
+   "id": "380ee173-6c72-40b8-9eb2-a5af680c8ff7",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Issue : Disputes and Misrepresentations\n",
+      "\n",
+      "\n",
+      "Classification Report:\n",
+      "                                   precision    recall  f1-score   support\n",
+      "\n",
+      "Attempted to collect wrong amount       0.85      0.92      0.88        66\n",
+      "                    Other problem       0.85      0.65      0.74        54\n",
+      "                Problem with fees       0.83      0.93      0.88        57\n",
+      "\n",
+      "                         accuracy                           0.84       177\n",
+      "                        macro avg       0.84      0.83      0.83       177\n",
+      "                     weighted avg       0.84      0.84      0.84       177\n",
+      "\n",
+      "Accuracy: 0.8418079096045198\n"
+     ]
+    }
+   ],
+   "source": [
+    "issue_name = issue_categories[11]\n",
+    "print(f\"Issue : {issue_name}\\n\")\n",
+    "\n",
+    "classify_sub_issue(issue_name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e44501a4-2021-4d78-b3c2-c937d286cb22",
+   "metadata": {},
+   "source": [
+    "----"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "451ccf3a-c97e-46e3-9c47-c225d6e3dd49",
+   "metadata": {},
+   "source": [
+    "#### 13. Problem with a company's investigation into an existing issue"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 45,
+   "id": "20201d0c-b9da-4e2e-957b-23649f06e48e",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Issue : Problem with a company's investigation into an existing issue\n",
+      "\n",
+      "\n",
+      "Classification Report:\n",
+      "                                                                                       precision    recall  f1-score   support\n",
+      "\n",
+      "Difficulty submitting a dispute or getting information about a dispute over the phone       0.00      0.00      0.00         3\n",
+      "                                                 Investigation took more than 30 days       1.00      1.00      1.00         3\n",
+      "                                           Problem with personal statement of dispute       0.00      0.00      0.00         2\n",
+      "                              Their investigation did not fix an error on your report       0.50      1.00      0.67         7\n",
+      "                                  Was not notified of investigation status or results       0.00      0.00      0.00         2\n",
+      "\n",
+      "                                                                             accuracy                           0.59        17\n",
+      "                                                                            macro avg       0.30      0.40      0.33        17\n",
+      "                                                                         weighted avg       0.38      0.59      0.45        17\n",
+      "\n",
+      "Accuracy: 0.5882352941176471\n"
+     ]
+    }
+   ],
+   "source": [
+    "issue_name = issue_categories[12]\n",
+    "print(f\"Issue : {issue_name}\\n\")\n",
+    "\n",
+    "classify_sub_issue(issue_name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c5d37ff8-2382-4c3b-aef0-5affd4d3083b",
+   "metadata": {},
+   "source": [
+    "----"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c9876639-9e72-49ab-9dd4-3ef5ac38a8d8",
+   "metadata": {},
+   "source": [
+    "#### 14. Closing your account"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 46,
+   "id": "95eff365-09f8-4640-9f65-4a82fc321fa9",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Issue : Closing your account\n",
+      "\n",
+      "\n",
+      "Classification Report:\n",
+      "                             precision    recall  f1-score   support\n",
+      "\n",
+      "   Can't close your account       1.00      0.24      0.38        17\n",
+      "Company closed your account       0.78      1.00      0.88        46\n",
+      "\n",
+      "                   accuracy                           0.79        63\n",
+      "                  macro avg       0.89      0.62      0.63        63\n",
+      "               weighted avg       0.84      0.79      0.74        63\n",
+      "\n",
+      "Accuracy: 0.7936507936507936\n"
+     ]
+    }
+   ],
+   "source": [
+    "issue_name = issue_categories[13]\n",
+    "print(f\"Issue : {issue_name}\\n\")\n",
+    "\n",
+    "classify_sub_issue(issue_name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c66b9044-32af-4aee-af08-b685480d9f53",
+   "metadata": {},
+   "source": [
+    "----"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "455f8d69-5531-42e0-a53c-66427ff68fcc",
+   "metadata": {},
+   "source": [
+    "#### 15. Credit Report and Monitoring Issues"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 47,
+   "id": "a039cb86-3503-4757-a8ee-7e518eafb9a5",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Issue : Credit Report and Monitoring Issues\n",
+      "\n",
+      "\n",
+      "Classification Report:\n",
+      "                                                                          precision    recall  f1-score   support\n",
+      "\n",
+      "                       Other problem getting your report or credit score       0.89      0.99      0.94        82\n",
+      "Problem canceling credit monitoring or identify theft protection service       0.97      0.75      0.85        40\n",
+      "\n",
+      "                                                                accuracy                           0.91       122\n",
+      "                                                               macro avg       0.93      0.87      0.89       122\n",
+      "                                                            weighted avg       0.92      0.91      0.91       122\n",
+      "\n",
+      "Accuracy: 0.9098360655737705\n"
+     ]
+    }
+   ],
+   "source": [
+    "issue_name = issue_categories[14]\n",
+    "print(f\"Issue : {issue_name}\\n\")\n",
+    "\n",
+    "classify_sub_issue(issue_name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ee0dfc45-96b2-4cbb-b34d-a8e1441c0c82",
+   "metadata": {},
+   "source": [
+    "----"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0dcf3701-d59f-43fa-9aa0-2c65c27a8fe0",
+   "metadata": {},
+   "source": [
+    "#### 16. Closing an account"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 48,
+   "id": "1ed7956b-3d41-46f8-a7e8-ad9f36e1694d",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Issue : Closing an account\n",
+      "\n",
+      "\n",
+      "Classification Report:\n",
+      "                                        precision    recall  f1-score   support\n",
+      "\n",
+      "              Can't close your account       1.00      0.04      0.07        27\n",
+      "           Company closed your account       0.57      0.83      0.67        69\n",
+      "Funds not received from closed account       0.56      0.50      0.53        50\n",
+      "\n",
+      "                              accuracy                           0.57       146\n",
+      "                             macro avg       0.71      0.45      0.42       146\n",
+      "                          weighted avg       0.64      0.57      0.51       146\n",
+      "\n",
+      "Accuracy: 0.5684931506849316\n"
+     ]
+    }
+   ],
+   "source": [
+    "issue_name = issue_categories[15]\n",
+    "print(f\"Issue : {issue_name}\\n\")\n",
+    "\n",
+    "classify_sub_issue(issue_name)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3822541c-f13c-4a96-862f-4c23cf2d3895",
+   "metadata": {},
+   "source": [
+    "#### 17. Legal and Threat Actions"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 49,
+   "id": "8fa5fc40-6d4f-4321-8eb0-9608dc5b84e2",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Issue : Legal and Threat Actions\n",
+      "\n",
+      "\n",
+      "Classification Report:\n",
+      "                                                      precision    recall  f1-score   support\n",
+      "\n",
+      "Threatened or suggested your credit would be damaged       1.00      1.00      1.00        48\n",
+      "\n",
+      "                                            accuracy                           1.00        48\n",
+      "                                           macro avg       1.00      1.00      1.00        48\n",
+      "                                        weighted avg       1.00      1.00      1.00        48\n",
+      "\n",
+      "Accuracy: 1.0\n"
+     ]
+    }
+   ],
+   "source": [
+    "issue_name = issue_categories[16]\n",
+    "print(f\"Issue : {issue_name}\\n\")\n",
+    "\n",
+    "classify_sub_issue(issue_name)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.19"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

subproduct_prediction/Sub_Product.ipynb ADDED Viewed

	@@ -0,0 +1,700 @@

+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "a751d479-1500-41e2-8c01-252e849dad05",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import warnings\n",
+    "warnings.filterwarnings(\"ignore\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "8158cb66-9f9a-4bb2-bc6e-6a51146be10c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt \n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.feature_extraction.text import TfidfVectorizer\n",
+    "from sklearn.pipeline import make_pipeline\n",
+    "from sklearn.linear_model import LogisticRegression\n",
+    "from sklearn.naive_bayes import MultinomialNB\n",
+    "from sklearn.svm import SVC\n",
+    "from sklearn.ensemble import RandomForestClassifier\n",
+    "from sklearn.metrics import classification_report,accuracy_score\n",
+    "import numpy as np\n",
+    "from sklearn.ensemble import RandomForestClassifier\n",
+    "from sklearn.preprocessing import OneHotEncoder\n",
+    "from sklearn.compose import ColumnTransformer\n",
+    "from sklearn.pipeline import Pipeline\n",
+    "from sklearn.pipeline import Pipeline\n",
+    "from sklearn.feature_extraction.text import TfidfVectorizer\n",
+    "from sklearn.ensemble import RandomForestClassifier\n",
+    "from sklearn.model_selection import train_test_split\n",
+    "from sklearn.metrics import classification_report, accuracy_score\n",
+    "from sklearn.utils.class_weight import compute_class_weight\n",
+    "import pickle"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "70ea935b-3b62-4cf9-8bef-06bf30904b20",
+   "metadata": {},
+   "source": [
+    "## Sub Products"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f9ddaa89-dc8d-40f5-8098-7d108ab9d578",
+   "metadata": {},
+   "source": [
+    "### Model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "c1f9fd85-f47e-4962-a693-7cb9efca763a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sklearn.pipeline import Pipeline\n",
+    "from sklearn.feature_extraction.text import TfidfVectorizer\n",
+    "from sklearn.metrics import accuracy_score, classification_report\n",
+    "from sklearn.utils.class_weight import compute_class_weight\n",
+    "\n",
+    "def train_model(training_df, validation_df, subproduct_to_predict, classifier_model, subproducts_to_drop=None, random_state=None):\n",
+    "    # Drop specified subproducts from training and validation dataframes\n",
+    "    if subproducts_to_drop:\n",
+    "        training_df = training_df[~training_df['Sub-product'].isin(subproducts_to_drop)]\n",
+    "        validation_df = validation_df[~validation_df['Sub-product'].isin(subproducts_to_drop)]\n",
+    "    \n",
+    "    # Compute class weights\n",
+    "    class_weights = compute_class_weight('balanced', classes=np.unique(training_df['Sub-product']), y=training_df['Sub-product'])\n",
+    "    \n",
+    "    # Convert class weights to dictionary format\n",
+    "    class_weight = {label: weight for label, weight in zip(np.unique(training_df['Sub-product']), class_weights)}\n",
+    "    \n",
+    "    # Define a default class weight for missing classes\n",
+    "    default_class_weight = 0.5\n",
+    "    \n",
+    "    # Assign default class weight for missing classes\n",
+    "    for label in np.unique(training_df['Sub-product']):\n",
+    "        if label not in class_weight:\n",
+    "            class_weight[label] = default_class_weight\n",
+    "    \n",
+    "    # Define the pipeline\n",
+    "    pipeline = Pipeline([\n",
+    "        ('tfidf', TfidfVectorizer()),\n",
+    "        ('classifier', classifier_model)\n",
+    "    ])\n",
+    "    \n",
+    "    # Train the pipeline\n",
+    "    pipeline.fit(training_df['Consumer complaint narrative'], training_df['Sub-product'])\n",
+    "    \n",
+    "    # Make predictions on the validation set\n",
+    "    y_pred = pipeline.predict(validation_df['Consumer complaint narrative'])\n",
+    "    \n",
+    "    # Evaluate the pipeline\n",
+    "    accuracy = accuracy_score(validation_df['Sub-product'], y_pred)\n",
+    "    print(\"Accuracy:\", accuracy)\n",
+    "    print(\"\\nClassification Report:\")\n",
+    "    print(classification_report(validation_df['Sub-product'], y_pred))\n",
+    "    \n",
+    "    return pipeline\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a7a0d277-75c1-4435-86e5-d0ee7d3dabf3",
+   "metadata": {},
+   "source": [
+    "#### Debt Collection"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "6a2e4857-31c7-4b57-a25c-e9e36473c033",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "debt_training_df= pd.read_csv('../data_preprocessing_scripts/product_data_splits/debt_collection_train_data.csv')\n",
+    "debt_val_df= pd.read_csv('../data_preprocessing_scripts/product_data_splits/debt_collection_val_data.csv')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "7fb6be2b-244f-4232-972c-9772128890ca",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>Consumer complaint narrative</th>\n",
+       "      <th>Product</th>\n",
+       "      <th>Sub-product</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>{$37.00} on XXXX XXXX XXXX I paid for gas thro...</td>\n",
+       "      <td>Debt collection</td>\n",
+       "      <td>Other debt</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>Debt from XXXX XXXX is result of identity thef...</td>\n",
+       "      <td>Debt collection</td>\n",
+       "      <td>Credit card debt</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>My son attended XXXX XXXX XXXX XXXX for severa...</td>\n",
+       "      <td>Debt collection</td>\n",
+       "      <td>Medical debt</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>XXXX is claiming I owe a debt for utilities ba...</td>\n",
+       "      <td>Debt collection</td>\n",
+       "      <td>Other debt</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>This debt collector engaged in abusive, decept...</td>\n",
+       "      <td>Debt collection</td>\n",
+       "      <td>I do not know</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "                        Consumer complaint narrative          Product  \\\n",
+       "0  {$37.00} on XXXX XXXX XXXX I paid for gas thro...  Debt collection   \n",
+       "1  Debt from XXXX XXXX is result of identity thef...  Debt collection   \n",
+       "2  My son attended XXXX XXXX XXXX XXXX for severa...  Debt collection   \n",
+       "3  XXXX is claiming I owe a debt for utilities ba...  Debt collection   \n",
+       "4  This debt collector engaged in abusive, decept...  Debt collection   \n",
+       "\n",
+       "        Sub-product  \n",
+       "0        Other debt  \n",
+       "1  Credit card debt  \n",
+       "2      Medical debt  \n",
+       "3        Other debt  \n",
+       "4     I do not know  "
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "debt_training_df.head()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "a14dbafd-6f1b-49cb-9712-434055da84f1",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Sub-product\n",
+       "Other debt                 2056\n",
+       "I do not know              1530\n",
+       "Credit card debt           1139\n",
+       "Medical debt                726\n",
+       "Auto debt                   397\n",
+       "Telecommunications debt     267\n",
+       "Rental debt                 122\n",
+       "Mortgage debt                94\n",
+       "Name: count, dtype: int64"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "debt_training_df['Sub-product'].value_counts()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "b78398b7-d027-403f-acf4-fa580d113b02",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Accuracy: 0.6633986928104575\n",
+      "\n",
+      "Classification Report:\n",
+      "                         precision    recall  f1-score   support\n",
+      "\n",
+      "              Auto debt       0.95      0.48      0.64        44\n",
+      "       Credit card debt       0.59      0.96      0.73       127\n",
+      "           Medical debt       0.77      0.62      0.68        81\n",
+      "          Mortgage debt       1.00      0.40      0.57        10\n",
+      "            Rental debt       0.67      0.14      0.24        14\n",
+      "Telecommunications debt       1.00      0.13      0.24        30\n",
+      "\n",
+      "               accuracy                           0.66       306\n",
+      "              macro avg       0.83      0.46      0.52       306\n",
+      "           weighted avg       0.75      0.66      0.63       306\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "\n",
+    "from sklearn.ensemble import RandomForestClassifier\n",
+    "\n",
+    "rf_classifier = RandomForestClassifier(n_estimators=200, random_state=42)\n",
+    "trained_model_d = train_model(debt_training_df, debt_val_df, 'Sub-product', rf_classifier, subproducts_to_drop=['Other debt', 'I do not know'], random_state=42)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "85bbc3fe-50b0-4578-8e67-151861f839da",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with open('models/Debt_model.pkl', 'wb') as f:\n",
+    "    pickle.dump(trained_model_d, f)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5c529ed8-3735-4494-9f90-6c005dfea6df",
+   "metadata": {},
+   "source": [
+    "#### Loan/Mortgages"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "id": "f33b26e9-4c5b-4498-ab23-a88aca5eb07f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "loans_training_df= pd.read_csv('../data_preprocessing_scripts/product_data_splits/loans___mortgage_train_data.csv')\n",
+    "loans_val_df= pd.read_csv('../data_preprocessing_scripts/product_data_splits/loans___mortgage_val_data.csv')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "id": "c8dcc18b-f7bb-4edd-965a-8c58500a0ea6",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Sub-product\n",
+       "Loan                              1464\n",
+       "Federal student loan servicing     914\n",
+       "Conventional home mortgage         236\n",
+       "Lease                              186\n",
+       "FHA mortgage                        94\n",
+       "Name: count, dtype: int64"
+      ]
+     },
+     "execution_count": 11,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "loans_training_df['Sub-product'].value_counts()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "b0da7a52-e00a-413a-80be-2e8221851275",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Accuracy: 0.8757763975155279\n",
+      "\n",
+      "Classification Report:\n",
+      "                                precision    recall  f1-score   support\n",
+      "\n",
+      "    Conventional home mortgage       0.81      0.50      0.62        26\n",
+      "                  FHA mortgage       1.00      0.20      0.33        10\n",
+      "Federal student loan servicing       1.00      0.96      0.98       102\n",
+      "                         Lease       1.00      0.29      0.44        21\n",
+      "                          Loan       0.81      1.00      0.90       163\n",
+      "\n",
+      "                      accuracy                           0.88       322\n",
+      "                     macro avg       0.93      0.59      0.65       322\n",
+      "                  weighted avg       0.89      0.88      0.85       322\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "from sklearn.ensemble import RandomForestClassifier\n",
+    "\n",
+    "rf_classifier = RandomForestClassifier(n_estimators=200, random_state=42)\n",
+    "trained_model_l = train_model(loans_training_df, loans_val_df, 'Sub-product', rf_classifier, random_state=42)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "a668b946-da36-410f-b474-f8a311952c5d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with open('models/loan_model.pkl', 'wb') as f:\n",
+    "    pickle.dump(trained_model_l, f)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "74796ebf-9934-46d2-a1b7-d6672dea727c",
+   "metadata": {},
+   "source": [
+    "#### Checking or savings account"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "id": "1cc65f08-96c8-4458-8703-b84b7554a04c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "cs_training_df= pd.read_csv('../data_preprocessing_scripts/product_data_splits/checking_or_savings_account_train_data.csv')\n",
+    "cs_val_df= pd.read_csv('../data_preprocessing_scripts/product_data_splits/checking_or_savings_account_val_data.csv')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "id": "240b2bcd-3839-4584-8a63-952fa17f9715",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Sub-product\n",
+       "Checking account                    13500\n",
+       "Savings account                      1391\n",
+       "Other banking product or service     1158\n",
+       "CD (Certificate of Deposit)           176\n",
+       "Name: count, dtype: int64"
+      ]
+     },
+     "execution_count": 15,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "cs_training_df['Sub-product'].value_counts()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "id": "3170c0c8-0dac-4755-aebf-dca9aa7f4dee",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Accuracy: 0.940099833610649\n",
+      "\n",
+      "Classification Report:\n",
+      "                                  precision    recall  f1-score   support\n",
+      "\n",
+      "     CD (Certificate of Deposit)       0.95      0.95      0.95        19\n",
+      "                Checking account       0.93      1.00      0.97      1500\n",
+      "Other banking product or service       1.00      0.60      0.75       129\n",
+      "                 Savings account       0.99      0.65      0.79       155\n",
+      "\n",
+      "                        accuracy                           0.94      1803\n",
+      "                       macro avg       0.97      0.80      0.86      1803\n",
+      "                    weighted avg       0.94      0.94      0.93      1803\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "from sklearn.ensemble import RandomForestClassifier\n",
+    "\n",
+    "rf_classifier = RandomForestClassifier(n_estimators=200, random_state=42)\n",
+    "trained_model_cs = train_model(cs_training_df, cs_val_df, 'Sub-product', rf_classifier, random_state=42)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "id": "59c87ff1-d7de-41a9-9e0a-33630bff1c18",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with open('models/Checking_saving_model.pkl', 'wb') as f:\n",
+    "    pickle.dump(trained_model_cs, f)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fe443859-4be6-4b87-be79-22487aaf5b3b",
+   "metadata": {},
+   "source": [
+    "#### 'Credit/Prepaid Card'"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 26,
+   "id": "31a70db8-06cb-4fb0-8d45-a7451aa81b0e",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "cp_training_df= pd.read_csv('../data_preprocessing_scripts/product_data_splits/credit_prepaid_card_train_data.csv')\n",
+    "cp_val_df= pd.read_csv('../data_preprocessing_scripts/product_data_splits/credit_prepaid_card_val_data.csv')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 27,
+   "id": "0e70a22d-01f9-4f59-a903-286a05eb5179",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Sub-product\n",
+       "General-purpose credit card or charge card    13320\n",
+       "Store credit card                              2232\n",
+       "Name: count, dtype: int64"
+      ]
+     },
+     "execution_count": 27,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "cp_training_df['Sub-product'].value_counts()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 28,
+   "id": "ef3b03f6-8207-4292-8ce2-e6ca5695c606",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Accuracy: 0.9427414690572585\n",
+      "\n",
+      "Classification Report:\n",
+      "                                            precision    recall  f1-score   support\n",
+      "\n",
+      "General-purpose credit card or charge card       0.94      1.00      0.97      1481\n",
+      "                         Store credit card       1.00      0.60      0.75       248\n",
+      "\n",
+      "                                  accuracy                           0.94      1729\n",
+      "                                 macro avg       0.97      0.80      0.86      1729\n",
+      "                              weighted avg       0.95      0.94      0.94      1729\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "from sklearn.ensemble import RandomForestClassifier\n",
+    "\n",
+    "rf_classifier = RandomForestClassifier(n_estimators=200, random_state=42)\n",
+    "trained_model_cp = train_model(cp_training_df, cp_val_df, 'Sub-product', rf_classifier, random_state=42)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "id": "ac3f39d0-8cb8-457e-9db7-510cc5a99830",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with open('models/Credit_Prepaid_Card_model.pkl', 'wb') as f:\n",
+    "    pickle.dump(trained_model_cp, f)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0787d4eb-9673-417b-91d1-cc98becd037e",
+   "metadata": {},
+   "source": [
+    "#### Credit_reporting_df"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 22,
+   "id": "8e074864-16f6-4fd5-8bfe-b054aeb0fc2a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "cr_training_df= pd.read_csv('../data_preprocessing_scripts/product_data_splits/credit_reporting_train_data.csv')\n",
+    "cr_val_df= pd.read_csv('../data_preprocessing_scripts/product_data_splits/credit_reporting_val_data.csv')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "id": "57257613-7dde-4561-942c-f559d2159744",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Sub-product\n",
+       "Credit reporting                  13500\n",
+       "Other personal consumer report      661\n",
+       "Name: count, dtype: int64"
+      ]
+     },
+     "execution_count": 23,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "cr_training_df['Sub-product'].value_counts()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "id": "cca27513-501f-4257-a4b1-0e13a3604250",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Accuracy: 0.9841168996188056\n",
+      "\n",
+      "Classification Report:\n",
+      "                                precision    recall  f1-score   support\n",
+      "\n",
+      "              Credit reporting       0.99      1.00      0.99      1500\n",
+      "Other personal consumer report       0.93      0.72      0.81        74\n",
+      "\n",
+      "                      accuracy                           0.98      1574\n",
+      "                     macro avg       0.96      0.86      0.90      1574\n",
+      "                  weighted avg       0.98      0.98      0.98      1574\n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "from sklearn.ensemble import RandomForestClassifier\n",
+    "\n",
+    "rf_classifier = RandomForestClassifier(n_estimators=200, random_state=42)\n",
+    "trained_model_cr = train_model(cr_training_df, cr_val_df, 'Sub-product', rf_classifier, random_state=42)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 25,
+   "id": "3cbb9aa5-6c0c-4b59-a181-7431e8fc60fc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with open('models/Credit_Reporting_model.pkl', 'wb') as f:\n",
+    "    pickle.dump(trained_model_cr, f)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9aea8fdd-ec86-40bc-b417-ba9169edabd9",
+   "metadata": {},
+   "source": [
+    "with open('models/Debt_model.pkl', 'wb') as f:\n",
+    "    pickle.dump(trained_model_d, f)\n",
+    "\n",
+    "with open('models/loan_model.pkl', 'wb') as f:\n",
+    "    pickle.dump(trained_model_l, f)\n",
+    "\n",
+    "with open('models/Checking_saving_model.pkl', 'wb') as f:\n",
+    "    pickle.dump(trained_model_cs, f)\n",
+    "\n",
+    "with open('models/Credit_Prepaid_Card_model.pkl', 'wb') as f:\n",
+    "    pickle.dump(trained_model_cp, f)\n",
+    "\n",
+    "with open('models/Credit_Reporting_model.pkl', 'wb') as f:\n",
+    "    pickle.dump(trained_model_cr, f)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.19"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

subproduct_prediction/issue_models/account_operations_and_unauthorized_transaction_issues.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f5a38e0d8214e3f947f2425245fb0cabd6484cbf5416bc7cb967be933d550e48
+size 13402084

subproduct_prediction/issue_models/attempts_to_collect_debt_not_owed.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5d2f89f554b692874926acc0622cc0da2b0d373adf6fe0ef991d396751a3e1fb
+size 35287313

subproduct_prediction/issue_models/closing_an_account.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3d6e05991b41724502ec39bcc0b36f4a99bcdc53b3f25e5b43dded7e6bdb872b
+size 13327249

subproduct_prediction/issue_models/closing_your_account.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6c1c947503ffd02bb74c27ea5516e92542c12a1f7075a8dd1ca2b40a50924a47
+size 3219384

subproduct_prediction/issue_models/credit_report_and_monitoring_issues.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cc0b2236e3428e157037f2a1be6b1895811df1f9c26552423278a796cc420700
+size 4546265

subproduct_prediction/issue_models/dealing_with_your_lender_or_servicer.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:48357a57a2aa170a13b424d9a0ccffc4aae3b03c7d072dec125bef02e5c24e11
+size 6053321

subproduct_prediction/issue_models/disputes_and_misrepresentations.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:84658b48682d815eaf93db9ead50d7c34a92ed9e09a72ff5397b594197bf3d10
+size 14356455

subproduct_prediction/issue_models/improper_use_of_your_report.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:64f73d84a7db394d0049116190ed19b7630c39ae4672f7c9840907d0e77ba544
+size 122627308

subproduct_prediction/issue_models/incorrect_information_on_your_report.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f2d2319330ab30e110677a3137e9a12c55d913f3b0ec4f6fa5a4e00353612ec3
+size 459390697

subproduct_prediction/issue_models/legal_and_threat_actions.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fedd05ae3cdb61f8015b09b487aa5740fbc92b97b480b0bde5d1c65d753fd54e
+size 224561

subproduct_prediction/issue_models/managing_an_account.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2528894a0f0fb7f90626d899b8059486e2f9fc21ce4dbb54f27cc284483ebeb0
+size 85679764

subproduct_prediction/issue_models/payment_and_funds_management.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d72801fc16ca98b4cbc703a3f8ad5f8103f087c3dcbe0e1c80a63401192f3f73
+size 11929289

subproduct_prediction/issue_models/problem_with_a_company's_investigation_into_an_existing_issue.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d71f394eaf1623337c71638ff409e1374e9bba3c87851f5ec9421f343629892d
+size 2050572

subproduct_prediction/issue_models/problem_with_a_company's_investigation_into_an_existing_problem.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:57ebfb56b53d745eeae4c732227f4af83f98242b1083c3a63963ec9aabcfbec1
+size 49789793

subproduct_prediction/issue_models/problem_with_a_credit_reporting_company's_investigation_into_an_existing_problem.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ff6c316c3100e632b1cfafee59d81f52f07d75d6e8af9323e72e5dcb9997ed5b
+size 132836007

subproduct_prediction/issue_models/problem_with_a_purchase_shown_on_your_statement.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cca681b5711e049df5e344596be0aa0f4a06db512aede2bba31899192aff2db8
+size 13227946

subproduct_prediction/issue_models/written_notification_about_debt.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:69e250f983a5dcaf1630d27fe5386402f3b99d9e22ebed40904f3d790245e1ef
+size 9169604

subproduct_prediction/models/Checking_saving_model.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:aa166dafab04f7c1ec8431cf5b9ccdfe9486abf0b7ad505ed835142a615029dd
+size 67244100

subproduct_prediction/models/Credit_Prepaid_Card_model.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8e0d8cee975c35ce85b98db9005a4517fe95f9a8f8b3fcb5b50e9fecd1c0a003
+size 44123155

subproduct_prediction/models/Credit_Reporting_model.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8ede473b35d13d58aa40501d27c6126403c5711dcedfcf0639ef472f7228967d
+size 18568054

subproduct_prediction/models/Debt_model.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d6be682dfe69330f5154309f4a00e0df313c3d7a75ca5d005ab8dd2394cc4ffb
+size 39776752

subproduct_prediction/models/Product_model.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:879ef7eea5e9d6e5e03c596bec4ac9cb18b9276ace228d53a2c44cf3912d280c
+size 288515807

subproduct_prediction/models/loan_model.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:63c50de262139d08a1a3c380cf2a4fa94114273a73daffd04ef2cc94859a9259
+size 23675105