{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "provenance": [] }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "language_info": { "name": "python" } }, "cells": [ { "cell_type": "markdown", "source": [ "# Fastag Fraud Detection" ], "metadata": { "id": "Nx7NdXm-7X4C" } }, { "cell_type": "markdown", "source": [ "## Project Pipeline\n", "\n", "1. **Data Exploration**:\n", " - Explore the dataset to understand the distribution of features and the prevalence of fraud indicators.\n", "\n", "2. **Feature Engineering**:\n", " - Identify and engineer relevant features that contribute to fraud detection accuracy.\n", "\n", "3. **Model Development**:\n", " - Build a machine learning classification model to predict and detect Fastag transaction fraud.\n", " - Evaluate and fine-tune model performance using appropriate metrics.\n", "\n", "4. **Real-time Fraud Detection**:\n", " - Explore the feasibility of implementing the model for real-time Fastag fraud detection.\n", "\n", "5. **Explanatory Analysis**:\n", " - Provide insights into the factors contributing to fraudulent transactions." ], "metadata": { "id": "033Tp9X19HJ5" } }, { "cell_type": "markdown", "source": [ "## Dataset Description\n", "\n", "The dataset contains the following columns:\n", "\n", "1. **Transaction_ID**: Unique identifier for each transaction.\n", "2. **Timestamp**: Date and time of the transaction.\n", "3. **Vehicle_Type**: Type of vehicle involved in the transaction.\n", "4. **FastagID**: Unique identifier for Fastag.\n", "5. **TollBoothID**: Identifier for the toll booth.\n", "6. **Lane_Type**: Type of lane used for the transaction.\n", "7. **Vehicle_Dimensions**: Dimensions of the vehicle.\n", "8. **Transaction_Amount**: Amount associated with the transaction.\n", "9. **Amount_paid**: Amount paid for the transaction.\n", "10. **Geographical_Location**: Location details of the transaction.\n", "11. **Vehicle_Speed**: Speed of the vehicle during the transaction.\n", "12. **Vehicle_Plate_Number**: License plate number of the vehicle.\n", "13. **Fraud_indicator**: Binary indicator of fraudulent activity (target variable).\n" ], "metadata": { "id": "XjP3xfZxCT1r" } }, { "cell_type": "markdown", "source": [ "# Exploratory Data Analysis (EDA)" ], "metadata": { "id": "balcKLWlA8ev" } }, { "cell_type": "code", "execution_count": 67, "metadata": { "id": "7vVanFDO6tx9" }, "outputs": [], "source": [ "# Let's import libraries\n", "import pandas as pd" ] }, { "cell_type": "code", "source": [ "df = pd.read_csv('/content/FastagFraudDetection.csv')" ], "metadata": { "id": "jYByW32N7rJP" }, "execution_count": 68, "outputs": [] }, { "cell_type": "code", "source": [ "# Let's look at columns\n", "df.columns" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "7q6qaY-M7x5a", "outputId": "1581087e-3e33-4808-8b87-00ab77352052" }, "execution_count": 69, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "Index(['Transaction_ID', 'Timestamp', 'Vehicle_Type', 'FastagID',\n", " 'TollBoothID', 'Lane_Type', 'Vehicle_Dimensions', 'Transaction_Amount',\n", " 'Amount_paid', 'Geographical_Location', 'Vehicle_Speed',\n", " 'Vehicle_Plate_Number', 'Fraud_indicator'],\n", " dtype='object')" ] }, "metadata": {}, "execution_count": 69 } ] }, { "cell_type": "code", "source": [ "# Let's look at first 5 rows\n", "df.head()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 313 }, "id": "xjs87Lib9gN6", "outputId": "9829ff57-b80c-48b7-bd1c-0afe12cda61e" }, "execution_count": 70, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " Transaction_ID Timestamp Vehicle_Type FastagID TollBoothID \\\n", "0 1 1/6/2023 11:20 Bus FTG-001-ABC-121 A-101 \n", "1 2 1/7/2023 14:55 Car FTG-002-XYZ-451 B-102 \n", "2 3 1/8/2023 18:25 Motorcycle NaN D-104 \n", "3 4 1/9/2023 2:05 Truck FTG-044-LMN-322 C-103 \n", "4 5 1/10/2023 6:35 Van FTG-505-DEF-652 B-102 \n", "\n", " Lane_Type Vehicle_Dimensions Transaction_Amount Amount_paid \\\n", "0 Express Large 350 120 \n", "1 Regular Small 120 100 \n", "2 Regular Small 0 0 \n", "3 Regular Large 350 120 \n", "4 Express Medium 140 100 \n", "\n", " Geographical_Location Vehicle_Speed Vehicle_Plate_Number \\\n", "0 13.059816123454882, 77.77068662374292 65 KA11AB1234 \n", "1 13.059816123454882, 77.77068662374292 78 KA66CD5678 \n", "2 13.059816123454882, 77.77068662374292 53 KA88EF9012 \n", "3 13.059816123454882, 77.77068662374292 92 KA11GH3456 \n", "4 13.059816123454882, 77.77068662374292 60 KA44IJ6789 \n", "\n", " Fraud_indicator \n", "0 Fraud \n", "1 Fraud \n", "2 Not Fraud \n", "3 Fraud \n", "4 Fraud " ], "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Transaction_IDTimestampVehicle_TypeFastagIDTollBoothIDLane_TypeVehicle_DimensionsTransaction_AmountAmount_paidGeographical_LocationVehicle_SpeedVehicle_Plate_NumberFraud_indicator
011/6/2023 11:20BusFTG-001-ABC-121A-101ExpressLarge35012013.059816123454882, 77.7706866237429265KA11AB1234Fraud
121/7/2023 14:55CarFTG-002-XYZ-451B-102RegularSmall12010013.059816123454882, 77.7706866237429278KA66CD5678Fraud
231/8/2023 18:25MotorcycleNaND-104RegularSmall0013.059816123454882, 77.7706866237429253KA88EF9012Not Fraud
341/9/2023 2:05TruckFTG-044-LMN-322C-103RegularLarge35012013.059816123454882, 77.7706866237429292KA11GH3456Fraud
451/10/2023 6:35VanFTG-505-DEF-652B-102ExpressMedium14010013.059816123454882, 77.7706866237429260KA44IJ6789Fraud
\n", "
\n", "
\n", "\n", "
\n", " \n", "\n", " \n", "\n", " \n", "
\n", "\n", "\n", "
\n", " \n", "\n", "\n", "\n", " \n", "
\n", "\n", "
\n", "
\n" ], "application/vnd.google.colaboratory.intrinsic+json": { "type": "dataframe", "variable_name": "df", "summary": "{\n \"name\": \"df\",\n \"rows\": 5000,\n \"fields\": [\n {\n \"column\": \"Transaction_ID\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1443,\n \"min\": 1,\n \"max\": 5000,\n \"num_unique_values\": 5000,\n \"samples\": [\n 1502,\n 2587,\n 2654\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Timestamp\",\n \"properties\": {\n \"dtype\": \"object\",\n \"num_unique_values\": 4423,\n \"samples\": [\n \"6/25/2023 7:17\",\n \"10/22/2023 2:04\",\n \"2/5/2023 0:42\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Vehicle_Type\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 7,\n \"samples\": [\n \"Bus \",\n \"Car\",\n \"Sedan\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"FastagID\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 4451,\n \"samples\": [\n \"FTG-580-DEF-850\",\n \"FTG-083-PQR-333\",\n \"FTG-125-EDC-765\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"TollBoothID\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 6,\n \"samples\": [\n \"A-101\",\n \"B-102\",\n \"D-106\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Lane_Type\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 2,\n \"samples\": [\n \"Regular\",\n \"Express\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Vehicle_Dimensions\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 3,\n \"samples\": [\n \"Large\",\n \"Small\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Transaction_Amount\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 112,\n \"min\": 0,\n \"max\": 350,\n \"num_unique_values\": 20,\n \"samples\": [\n 350,\n 330\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Amount_paid\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 106,\n \"min\": 0,\n \"max\": 350,\n \"num_unique_values\": 23,\n \"samples\": [\n 340,\n 60\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Geographical_Location\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 5,\n \"samples\": [\n \"13.042660878688794, 77.47580097259879\",\n \"13.21331620748757, 77.55413526894684\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Vehicle_Speed\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 16,\n \"min\": 10,\n \"max\": 118,\n \"num_unique_values\": 85,\n \"samples\": [\n 35,\n 65\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Vehicle_Plate_Number\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 5000,\n \"samples\": [\n \"KA05CD5678\",\n \"KA67LM4267\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Fraud_indicator\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 2,\n \"samples\": [\n \"Not Fraud\",\n \"Fraud\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" } }, "metadata": {}, "execution_count": 70 } ] }, { "cell_type": "code", "source": [ "# Let's check up missing value\n", "df.isnull().any()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "zUdPKKpE9hbR", "outputId": "c02bfa26-c9de-444e-c1a4-220de494385d" }, "execution_count": 71, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "Transaction_ID False\n", "Timestamp False\n", "Vehicle_Type False\n", "FastagID True\n", "TollBoothID False\n", "Lane_Type False\n", "Vehicle_Dimensions False\n", "Transaction_Amount False\n", "Amount_paid False\n", "Geographical_Location False\n", "Vehicle_Speed False\n", "Vehicle_Plate_Number False\n", "Fraud_indicator False\n", "dtype: bool" ] }, "metadata": {}, "execution_count": 71 } ] }, { "cell_type": "code", "source": [ "# Let's check up missing value count\n", "df.isnull().sum()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "5N47_ZUF9r6Z", "outputId": "da5da61a-df41-4c8a-f34f-32aabc271c96" }, "execution_count": 72, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "Transaction_ID 0\n", "Timestamp 0\n", "Vehicle_Type 0\n", "FastagID 549\n", "TollBoothID 0\n", "Lane_Type 0\n", "Vehicle_Dimensions 0\n", "Transaction_Amount 0\n", "Amount_paid 0\n", "Geographical_Location 0\n", "Vehicle_Speed 0\n", "Vehicle_Plate_Number 0\n", "Fraud_indicator 0\n", "dtype: int64" ] }, "metadata": {}, "execution_count": 72 } ] }, { "cell_type": "code", "source": [ "# Let's look at shape of dataframe\n", "df.shape" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "vpftuKMn9wLp", "outputId": "5c9fa3c2-4b81-43b6-ce98-f1fa67f2686d" }, "execution_count": 73, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "(5000, 13)" ] }, "metadata": {}, "execution_count": 73 } ] }, { "cell_type": "code", "source": [ "# Let's look at number of unique values\n", "df.nunique()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "YaMBm1lI94ip", "outputId": "a6b35aa6-0311-4e86-aeaa-8e58dd230cb1" }, "execution_count": 74, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "Transaction_ID 5000\n", "Timestamp 4423\n", "Vehicle_Type 7\n", "FastagID 4451\n", "TollBoothID 6\n", "Lane_Type 2\n", "Vehicle_Dimensions 3\n", "Transaction_Amount 20\n", "Amount_paid 23\n", "Geographical_Location 5\n", "Vehicle_Speed 85\n", "Vehicle_Plate_Number 5000\n", "Fraud_indicator 2\n", "dtype: int64" ] }, "metadata": {}, "execution_count": 74 } ] }, { "cell_type": "code", "source": [ "for i in df.columns:\n", " print(f'{i}: {df[i].unique()[:15]}')" ], "metadata": { "id": "1aHCyVdgaRgj", "outputId": "bb2b9c33-1a52-4bbb-be24-f17a8be447bc", "colab": { "base_uri": "https://localhost:8080/" } }, "execution_count": 75, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Transaction_ID: [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]\n", "Timestamp: ['1/6/2023 11:20' '1/7/2023 14:55' '1/8/2023 18:25' '1/9/2023 2:05'\n", " '1/10/2023 6:35' '1/11/2023 10:00' '1/12/2023 15:40' '1/13/2023 20:15'\n", " '1/14/2023 1:55' '1/15/2023 7:30' '1/16/2023 12:10' '1/17/2023 17:45'\n", " '1/18/2023 22:20' '1/19/2023 4:00' '1/20/2023 8:30']\n", "Vehicle_Type: ['Bus ' 'Car' 'Motorcycle' 'Truck' 'Van' 'Sedan' 'SUV']\n", "FastagID: ['FTG-001-ABC-121' 'FTG-002-XYZ-451' nan 'FTG-044-LMN-322'\n", " 'FTG-505-DEF-652' 'FTG-066-GHI-987' 'FTG-707-JKL-210' 'FTG-088-UVW-543'\n", " 'FTG-909-RST-876' 'FTG-021-QWE-765' 'FTG-011-ZXC-431' 'FTG-013-POI-104'\n", " 'FTG-014-KJH-872' 'FTG-055-DCV-543' 'FTG-066-NBH-210']\n", "TollBoothID: ['A-101' 'B-102' 'D-104' 'C-103' 'D-105' 'D-106']\n", "Lane_Type: ['Express' 'Regular']\n", "Vehicle_Dimensions: ['Large' 'Small' 'Medium']\n", "Transaction_Amount: [350 120 0 140 160 180 290 110 100 130 60 150 340 300 70]\n", "Amount_paid: [120 100 0 160 90 180 350 140 110 60 290 130 70 190 150]\n", "Geographical_Location: ['13.059816123454882, 77.77068662374292'\n", " '13.042660878688794, 77.47580097259879'\n", " '12.84197701525119, 77.67547528176169'\n", " '12.936687032945434, 77.53113977439017'\n", " '13.21331620748757, 77.55413526894684']\n", "Vehicle_Speed: [ 65 78 53 92 60 105 70 88 45 72 58 81 67 98 50]\n", "Vehicle_Plate_Number: ['KA11AB1234' 'KA66CD5678' 'KA88EF9012' 'KA11GH3456' 'KA44IJ6789'\n", " 'KA77KL0123' 'KA22MN4567' 'KA21OP8901' 'KA16QR2345' 'KA22ST6789'\n", " 'KA12UV0123' 'KA35WX3454' 'KA38YZ6785' 'KA14AB0123' 'KA40CD4557']\n", "Fraud_indicator: ['Fraud' 'Not Fraud']\n" ] } ] }, { "cell_type": "code", "source": [ "# Let's look at info of dataframe\n", "df.info()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "O3oWL-jC98DZ", "outputId": "91be4257-eb2b-492e-9bdc-af61ceec957a" }, "execution_count": 11, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "\n", "RangeIndex: 5000 entries, 0 to 4999\n", "Data columns (total 13 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 Transaction_ID 5000 non-null int64 \n", " 1 Timestamp 5000 non-null object\n", " 2 Vehicle_Type 5000 non-null object\n", " 3 FastagID 4451 non-null object\n", " 4 TollBoothID 5000 non-null object\n", " 5 Lane_Type 5000 non-null object\n", " 6 Vehicle_Dimensions 5000 non-null object\n", " 7 Transaction_Amount 5000 non-null int64 \n", " 8 Amount_paid 5000 non-null int64 \n", " 9 Geographical_Location 5000 non-null object\n", " 10 Vehicle_Speed 5000 non-null int64 \n", " 11 Vehicle_Plate_Number 5000 non-null object\n", " 12 Fraud_indicator 5000 non-null object\n", "dtypes: int64(4), object(9)\n", "memory usage: 507.9+ KB\n" ] } ] }, { "cell_type": "code", "source": [ "# Let's look at descriptive statistic methods of dataframe\n", "df.describe().T" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 175 }, "id": "RRN5CXeR-Fib", "outputId": "ce8d1f25-c7f1-4307-a9a3-9d57268d98b3" }, "execution_count": 12, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " count mean std min 25% 50% \\\n", "Transaction_ID 5000.0 2500.5000 1443.520003 1.0 1250.75 2500.5 \n", "Transaction_Amount 5000.0 161.0620 112.449950 0.0 100.00 130.0 \n", "Amount_paid 5000.0 141.2610 106.480996 0.0 90.00 120.0 \n", "Vehicle_Speed 5000.0 67.8512 16.597547 10.0 54.00 67.0 \n", "\n", " 75% max \n", "Transaction_ID 3750.25 5000.0 \n", "Transaction_Amount 290.00 350.0 \n", "Amount_paid 160.00 350.0 \n", "Vehicle_Speed 82.00 118.0 " ], "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countmeanstdmin25%50%75%max
Transaction_ID5000.02500.50001443.5200031.01250.752500.53750.255000.0
Transaction_Amount5000.0161.0620112.4499500.0100.00130.0290.00350.0
Amount_paid5000.0141.2610106.4809960.090.00120.0160.00350.0
Vehicle_Speed5000.067.851216.59754710.054.0067.082.00118.0
\n", "
\n", "
\n", "\n", "
\n", " \n", "\n", " \n", "\n", " \n", "
\n", "\n", "\n", "
\n", " \n", "\n", "\n", "\n", " \n", "
\n", "\n", "
\n", "
\n" ], "application/vnd.google.colaboratory.intrinsic+json": { "type": "dataframe", "summary": "{\n \"name\": \"df\",\n \"rows\": 4,\n \"fields\": [\n {\n \"column\": \"count\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 0.0,\n \"min\": 5000.0,\n \"max\": 5000.0,\n \"num_unique_values\": 1,\n \"samples\": [\n 5000.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"mean\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1189.2304350110944,\n \"min\": 67.8512,\n \"max\": 2500.5,\n \"num_unique_values\": 4,\n \"samples\": [\n 161.062\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"std\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 683.9122043112384,\n \"min\": 16.597546634091863,\n \"max\": 1443.5200033252052,\n \"num_unique_values\": 4,\n \"samples\": [\n 112.44994955192665\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"min\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 4.856267428111155,\n \"min\": 0.0,\n \"max\": 10.0,\n \"num_unique_values\": 3,\n \"samples\": [\n 1.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"25%\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 585.0419278066032,\n \"min\": 54.0,\n \"max\": 1250.75,\n \"num_unique_values\": 4,\n \"samples\": [\n 100.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"50%\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1197.7357370611153,\n \"min\": 67.0,\n \"max\": 2500.5,\n \"num_unique_values\": 4,\n \"samples\": [\n 130.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"75%\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 1788.5173372447357,\n \"min\": 82.0,\n \"max\": 3750.25,\n \"num_unique_values\": 4,\n \"samples\": [\n 290.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"max\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 2366.195469524866,\n \"min\": 118.0,\n \"max\": 5000.0,\n \"num_unique_values\": 3,\n \"samples\": [\n 5000.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" } }, "metadata": {}, "execution_count": 12 } ] }, { "cell_type": "code", "source": [ "# Let's look at data types of columns\n", "df.dtypes" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "PRni1U5T-HRB", "outputId": "b5e4b93f-7752-42a8-e71c-5b850c9aebfa" }, "execution_count": 11, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "Transaction_ID int64\n", "Timestamp object\n", "Vehicle_Type object\n", "FastagID object\n", "TollBoothID object\n", "Lane_Type object\n", "Vehicle_Dimensions object\n", "Transaction_Amount int64\n", "Amount_paid int64\n", "Geographical_Location object\n", "Vehicle_Speed int64\n", "Vehicle_Plate_Number object\n", "Fraud_indicator object\n", "dtype: object" ] }, "metadata": {}, "execution_count": 11 } ] }, { "cell_type": "markdown", "source": [ "# Feature Engineering" ], "metadata": { "id": "ppPsYltUBLkK" } }, { "cell_type": "code", "source": [ "df.head(2)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 214 }, "id": "vIcdUS_9BKTZ", "outputId": "8618db92-10ed-4c40-e5de-a391b91c45fd" }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " Transaction_ID Timestamp FastagID TollBoothID \\\n", "0 1 1/6/2023 11:20 FTG-001-ABC-121 101 \n", "1 2 1/7/2023 14:55 FTG-002-XYZ-451 102 \n", "\n", " Transaction_Amount Amount_paid Geographical_Location \\\n", "0 350 120 13.059816123454882, 77.77068662374292 \n", "1 120 100 13.059816123454882, 77.77068662374292 \n", "\n", " Vehicle_Speed Vehicle_Plate_Number Fraud_indicator ... Lane_Type_Regular \\\n", "0 65 KA11AB1234 1 ... False \n", "1 78 KA66CD5678 1 ... True \n", "\n", " Vehicle_Dimensions_Medium Vehicle_Dimensions_Small \\\n", "0 False False \n", "1 False True \n", "\n", " Vehicle_Speed_group_21-41 Vehicle_Speed_group_41-93 \\\n", "0 False True \n", "1 False True \n", "\n", " Vehicle_Speed_group_93-103 Vehicle_Speed_group_<= 21 \\\n", "0 False False \n", "1 False False \n", "\n", " Transaction_Amount_group_330 < Transaction_Amount_group_60-180 \\\n", "0 True False \n", "1 False True \n", "\n", " Transaction_Amount_group_<= 60 \n", "0 False \n", "1 False \n", "\n", "[2 rows x 29 columns]" ], "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Transaction_IDTimestampFastagIDTollBoothIDTransaction_AmountAmount_paidGeographical_LocationVehicle_SpeedVehicle_Plate_NumberFraud_indicator...Lane_Type_RegularVehicle_Dimensions_MediumVehicle_Dimensions_SmallVehicle_Speed_group_21-41Vehicle_Speed_group_41-93Vehicle_Speed_group_93-103Vehicle_Speed_group_<= 21Transaction_Amount_group_330 <Transaction_Amount_group_60-180Transaction_Amount_group_<= 60
011/6/2023 11:20FTG-001-ABC-12110135012013.059816123454882, 77.7706866237429265KA11AB12341...FalseFalseFalseFalseTrueFalseFalseTrueFalseFalse
121/7/2023 14:55FTG-002-XYZ-45110212010013.059816123454882, 77.7706866237429278KA66CD56781...TrueFalseTrueFalseTrueFalseFalseFalseTrueFalse
\n", "

2 rows × 29 columns

\n", "
\n", "
\n", "\n", "
\n", " \n", "\n", " \n", "\n", " \n", "
\n", "\n", "\n", "
\n", " \n", "\n", "\n", "\n", " \n", "
\n", "\n", "
\n", "
\n" ], "application/vnd.google.colaboratory.intrinsic+json": { "type": "dataframe", "variable_name": "df" } }, "metadata": {}, "execution_count": 53 } ] }, { "cell_type": "code", "source": [ "# # Let's seperate location column to long and lat\n", "# df[['lat', 'long']] = df['Geographical_Location'].apply(lambda x: pd.Series(x.split(',')))" ], "metadata": { "id": "XkGCYoRFo9F0" }, "execution_count": 13, "outputs": [] }, { "cell_type": "code", "source": [ "# Let's find which percent are paid\n", "df['Percent_paid'] = df.apply(lambda row: row['Amount_paid'] / row['Transaction_Amount'] if row['Transaction_Amount'] != 0 else 0, axis=1)" ], "metadata": { "id": "FUkOBL3hp1RE" }, "execution_count": 76, "outputs": [] }, { "cell_type": "code", "source": [ "# Let's convert fraud column to int data type\n", "df['Fraud_indicator'] = df['Fraud_indicator'].replace({'Fraud':1, 'Not Fraud':0})" ], "metadata": { "id": "OwshZBDAqnXz" }, "execution_count": 77, "outputs": [] }, { "cell_type": "code", "source": [ "# Let's choose right group for age group\n", "import matplotlib.pyplot as plt\n", "for i in ['Vehicle_Speed', 'Transaction_Amount']:\n", " age_salary_mean = df.groupby(by=i).aggregate({'Fraud_indicator': 'mean'})\n", " plt.figure(figsize=(10, 3))\n", " plt.plot(age_salary_mean.index, age_salary_mean['Fraud_indicator'], marker='o', linestyle='-', color='b', label='Mean Fraud Indicator')\n", " plt.title(f'Mean Fraud Indicator by {i}', fontsize=16)\n", " plt.xlabel(f'{i}', fontsize=14)\n", " plt.ylabel('Mean Fraud Indicator', fontsize=14)\n", " step_size = 2\n", " plt.xticks(age_salary_mean.index[::step_size], fontsize=12, rotation=90)\n", " plt.yticks(fontsize=12)\n", " plt.grid(True)\n", " plt.tight_layout()\n", "\n", " plt.show()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 597 }, "id": "KlvEqtKLHF6H", "outputId": "7c0ebc22-2bf1-4ec2-ba48-8f324ec5f1fc" }, "execution_count": null, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "
" ], "image/png": "\n" }, "metadata": {} }, { "output_type": "display_data", "data": { "text/plain": [ "
" ], "image/png": "\n" }, "metadata": {} } ] }, { "cell_type": "code", "source": [ "# Let's also group Vehicle_Speed, Transaction_Amount columns\n", "def Vehicle_Speed_group(speed):\n", " if speed <= 21:\n", " return '<= 21'\n", " elif speed <= 41:\n", " return '21-41'\n", " elif speed <= 93:\n", " return '41-93'\n", " elif speed <= 103:\n", " return '93-103'\n", " else:\n", " return '103 <'\n", "\n", "def Transaction_Amount_group(amount):\n", " if amount <= 60:\n", " return '<= 60'\n", " elif amount <= 180:\n", " return '60-180'\n", " elif amount <= 330:\n", " return '180-330'\n", " else:\n", " return '330 <'\n", "\n", "df['Vehicle_Speed_group'] = df['Vehicle_Speed'].apply(lambda x: Vehicle_Speed_group(x))\n", "df['Transaction_Amount_group'] = df['Transaction_Amount'].apply(lambda x: Transaction_Amount_group(x))" ], "metadata": { "id": "sTv3x6AxIvUI" }, "execution_count": 78, "outputs": [] }, { "cell_type": "code", "source": [ "# Let's encode categorical features\n", "df = pd.get_dummies(df, columns=['Vehicle_Type', 'Lane_Type','TollBoothID', 'Vehicle_Dimensions','Vehicle_Speed_group','Transaction_Amount_group'], drop_first=True)" ], "metadata": { "id": "r9Jui9rNOUcI" }, "execution_count": 79, "outputs": [] }, { "cell_type": "markdown", "source": [ "# Data Preprocessing" ], "metadata": { "id": "5Kl79dn9PPmH" } }, { "cell_type": "code", "source": [ "df.head()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 360 }, "id": "VmXFcpOlxE1y", "outputId": "d76a6def-bef9-444e-ead1-3266aa11d287" }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " Transaction_ID Timestamp FastagID Transaction_Amount \\\n", "0 1 1/6/2023 11:20 FTG-001-ABC-121 350 \n", "1 2 1/7/2023 14:55 FTG-002-XYZ-451 120 \n", "2 3 1/8/2023 18:25 NaN 0 \n", "3 4 1/9/2023 2:05 FTG-044-LMN-322 350 \n", "4 5 1/10/2023 6:35 FTG-505-DEF-652 140 \n", "\n", " Amount_paid Geographical_Location Vehicle_Speed \\\n", "0 120 13.059816123454882, 77.77068662374292 65 \n", "1 100 13.059816123454882, 77.77068662374292 78 \n", "2 0 13.059816123454882, 77.77068662374292 53 \n", "3 120 13.059816123454882, 77.77068662374292 92 \n", "4 100 13.059816123454882, 77.77068662374292 60 \n", "\n", " Vehicle_Plate_Number Fraud_indicator lat ... \\\n", "0 KA11AB1234 1 13.059816123454882 ... \n", "1 KA66CD5678 1 13.059816123454882 ... \n", "2 KA88EF9012 0 13.059816123454882 ... \n", "3 KA11GH3456 1 13.059816123454882 ... \n", "4 KA44IJ6789 1 13.059816123454882 ... \n", "\n", " TollBoothID_D-106 Vehicle_Dimensions_Medium Vehicle_Dimensions_Small \\\n", "0 False False False \n", "1 False False True \n", "2 False False True \n", "3 False False False \n", "4 False True False \n", "\n", " Vehicle_Speed_group_21-41 Vehicle_Speed_group_41-93 \\\n", "0 False True \n", "1 False True \n", "2 False True \n", "3 False True \n", "4 False True \n", "\n", " Vehicle_Speed_group_93-103 Vehicle_Speed_group_<= 21 \\\n", "0 False False \n", "1 False False \n", "2 False False \n", "3 False False \n", "4 False False \n", "\n", " Transaction_Amount_group_330 < Transaction_Amount_group_60-180 \\\n", "0 True False \n", "1 False True \n", "2 False False \n", "3 True False \n", "4 False True \n", "\n", " Transaction_Amount_group_<= 60 \n", "0 False \n", "1 False \n", "2 True \n", "3 False \n", "4 False \n", "\n", "[5 rows x 33 columns]" ], "text/html": [ "\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Transaction_IDTimestampFastagIDTransaction_AmountAmount_paidGeographical_LocationVehicle_SpeedVehicle_Plate_NumberFraud_indicatorlat...TollBoothID_D-106Vehicle_Dimensions_MediumVehicle_Dimensions_SmallVehicle_Speed_group_21-41Vehicle_Speed_group_41-93Vehicle_Speed_group_93-103Vehicle_Speed_group_<= 21Transaction_Amount_group_330 <Transaction_Amount_group_60-180Transaction_Amount_group_<= 60
011/6/2023 11:20FTG-001-ABC-12135012013.059816123454882, 77.7706866237429265KA11AB1234113.059816123454882...FalseFalseFalseFalseTrueFalseFalseTrueFalseFalse
121/7/2023 14:55FTG-002-XYZ-45112010013.059816123454882, 77.7706866237429278KA66CD5678113.059816123454882...FalseFalseTrueFalseTrueFalseFalseFalseTrueFalse
231/8/2023 18:25NaN0013.059816123454882, 77.7706866237429253KA88EF9012013.059816123454882...FalseFalseTrueFalseTrueFalseFalseFalseFalseTrue
341/9/2023 2:05FTG-044-LMN-32235012013.059816123454882, 77.7706866237429292KA11GH3456113.059816123454882...FalseFalseFalseFalseTrueFalseFalseTrueFalseFalse
451/10/2023 6:35FTG-505-DEF-65214010013.059816123454882, 77.7706866237429260KA44IJ6789113.059816123454882...FalseTrueFalseFalseTrueFalseFalseFalseTrueFalse
\n", "

5 rows × 33 columns

\n", "
\n", "
\n", "\n", "
\n", " \n", "\n", " \n", "\n", " \n", "
\n", "\n", "\n", "
\n", " \n", "\n", "\n", "\n", " \n", "
\n", "\n", "
\n", "
\n" ], "application/vnd.google.colaboratory.intrinsic+json": { "type": "dataframe", "variable_name": "df" } }, "metadata": {}, "execution_count": 74 } ] }, { "cell_type": "code", "source": [ "# Let's first remove unnecessary columns\n", "df.drop(columns = ['Transaction_ID','Timestamp','FastagID','Geographical_Location','Vehicle_Plate_Number'], inplace = True)" ], "metadata": { "id": "XsD8KB7oPAUu" }, "execution_count": 80, "outputs": [] }, { "cell_type": "code", "source": [ "# Let's handle missing values\n", "df.isnull().sum()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "bgS6H9TeQWCX", "outputId": "57aa3ccd-c563-46d5-f81f-f792a45cc85d" }, "execution_count": 81, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "Transaction_Amount 0\n", "Amount_paid 0\n", "Vehicle_Speed 0\n", "Fraud_indicator 0\n", "Percent_paid 0\n", "Vehicle_Type_Car 0\n", "Vehicle_Type_Motorcycle 0\n", "Vehicle_Type_SUV 0\n", "Vehicle_Type_Sedan 0\n", "Vehicle_Type_Truck 0\n", "Vehicle_Type_Van 0\n", "Lane_Type_Regular 0\n", "TollBoothID_B-102 0\n", "TollBoothID_C-103 0\n", "TollBoothID_D-104 0\n", "TollBoothID_D-105 0\n", "TollBoothID_D-106 0\n", "Vehicle_Dimensions_Medium 0\n", "Vehicle_Dimensions_Small 0\n", "Vehicle_Speed_group_21-41 0\n", "Vehicle_Speed_group_41-93 0\n", "Vehicle_Speed_group_93-103 0\n", "Vehicle_Speed_group_<= 21 0\n", "Transaction_Amount_group_330 < 0\n", "Transaction_Amount_group_60-180 0\n", "Transaction_Amount_group_<= 60 0\n", "dtype: int64" ] }, "metadata": {}, "execution_count": 81 } ] }, { "cell_type": "code", "source": [ "# Let's convert all bool to int\n", "for col in df.select_dtypes(include=['bool', 'object']):\n", " try:\n", " df[col] = df[col].astype(int)\n", " except ValueError:\n", " df[col] = df[col].astype(float).round().astype(int)" ], "metadata": { "id": "jI2DrcHtSaUm" }, "execution_count": 82, "outputs": [] }, { "cell_type": "markdown", "source": [ "Okey Great !!!✅" ], "metadata": { "id": "9MD32W9WWjnN" } }, { "cell_type": "code", "source": [ "# Let's check unique values for each columns\n", "for i in df.columns:\n", " print(f'{i}: [{df[i].nunique()}]')" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "KOA4JYADQE01", "outputId": "dea4cb10-14d0-491b-c852-155970d4d39d" }, "execution_count": 20, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Transaction_Amount: [20]\n", "Amount_paid: [23]\n", "Vehicle_Speed: [85]\n", "Fraud_indicator: [2]\n", "lat: [1]\n", "long: [2]\n", "Percent_paid: [84]\n", "Vehicle_Type_Car: [2]\n", "Vehicle_Type_Motorcycle: [2]\n", "Vehicle_Type_SUV: [2]\n", "Vehicle_Type_Sedan: [2]\n", "Vehicle_Type_Truck: [2]\n", "Vehicle_Type_Van: [2]\n", "Lane_Type_Regular: [2]\n", "TollBoothID_B-102: [2]\n", "TollBoothID_C-103: [2]\n", "TollBoothID_D-104: [2]\n", "TollBoothID_D-105: [2]\n", "TollBoothID_D-106: [2]\n", "Vehicle_Dimensions_Medium: [2]\n", "Vehicle_Dimensions_Small: [2]\n", "Vehicle_Speed_group_21-41: [2]\n", "Vehicle_Speed_group_41-93: [2]\n", "Vehicle_Speed_group_93-103: [2]\n", "Vehicle_Speed_group_<= 21: [2]\n", "Transaction_Amount_group_330 <: [2]\n", "Transaction_Amount_group_60-180: [2]\n", "Transaction_Amount_group_<= 60: [2]\n" ] } ] }, { "cell_type": "code", "source": [ "df.dtypes" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "ycpdIgLq0UfZ", "outputId": "1b5e2e52-8816-4ac8-d25c-619484b44029" }, "execution_count": null, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "Transaction_Amount int64\n", "Amount_paid int64\n", "Vehicle_Speed int64\n", "Fraud_indicator int64\n", "lat object\n", "long object\n", "Percent_paid float64\n", "Vehicle_Type_Car int64\n", "Vehicle_Type_Motorcycle int64\n", "Vehicle_Type_SUV int64\n", "Vehicle_Type_Sedan int64\n", "Vehicle_Type_Truck int64\n", "Vehicle_Type_Van int64\n", "Lane_Type_Regular int64\n", "TollBoothID_B-102 int64\n", "TollBoothID_C-103 int64\n", "TollBoothID_D-104 int64\n", "TollBoothID_D-105 int64\n", "TollBoothID_D-106 int64\n", "Vehicle_Dimensions_Medium int64\n", "Vehicle_Dimensions_Small int64\n", "Vehicle_Speed_group_21-41 int64\n", "Vehicle_Speed_group_41-93 int64\n", "Vehicle_Speed_group_93-103 int64\n", "Vehicle_Speed_group_<= 21 int64\n", "Transaction_Amount_group_330 < int64\n", "Transaction_Amount_group_60-180 int64\n", "Transaction_Amount_group_<= 60 int64\n", "dtype: object" ] }, "metadata": {}, "execution_count": 90 } ] }, { "cell_type": "markdown", "source": [ "# Machine Learning Model Development" ], "metadata": { "id": "IXrvqggoYIJW" } }, { "cell_type": "code", "source": [ "# Let's copy dataset\n", "ml_data = df.copy()" ], "metadata": { "id": "WOJmMqrhYD9G" }, "execution_count": 83, "outputs": [] }, { "cell_type": "code", "source": [ "# Let's define target and input value\n", "X = ml_data.drop('Fraud_indicator', axis =1)\n", "y = ml_data['Fraud_indicator']" ], "metadata": { "id": "Qv8Hv72QZROW" }, "execution_count": 84, "outputs": [] }, { "cell_type": "code", "source": [ "# Let's divide dataset to train and test data\n", "from sklearn.model_selection import train_test_split\n", "\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)" ], "metadata": { "id": "S4Kkk4sEZS-k" }, "execution_count": 85, "outputs": [] }, { "cell_type": "code", "source": [ "# Let's import necessary libraries\n", "import pandas as pd\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.linear_model import LogisticRegression\n", "from sklearn.neighbors import KNeighborsClassifier\n", "from sklearn.tree import DecisionTreeClassifier\n", "from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier, AdaBoostClassifier\n", "from sklearn.svm import SVC\n", "from sklearn.naive_bayes import GaussianNB\n", "from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score\n", "\n", "# Define classifiers\n", "classifiers = {\n", " \"Logistic Regression\": LogisticRegression(max_iter=1000),\n", " \"K Nearest Neighbors\": KNeighborsClassifier(),\n", " \"Decision Tree\": DecisionTreeClassifier(),\n", " \"Random Forest\": RandomForestClassifier(),\n", " \"Gradient Boosting\": GradientBoostingClassifier(),\n", " \"Support Vector Classification\": SVC(),\n", " \"AdaBoost\": AdaBoostClassifier(),\n", " \"Naive Bayes\": GaussianNB()\n", "}\n", "\n", "for key, classifier in classifiers.items():\n", " classifier.fit(X_train, y_train)\n", " y_pred = classifier.predict(X_test)\n", "\n", " accuracy = accuracy_score(y_test, y_pred)\n", " precision = precision_score(y_test, y_pred, average='weighted')\n", " recall = recall_score(y_test, y_pred, average='weighted')\n", " f1 = f1_score(y_test, y_pred, average='weighted')\n", "\n", " print(\"Classifier:\", key)\n", " print(\"Accuracy:\", round(accuracy * 100, 2), \"%\")\n", " print(\"Precision:\", round(precision * 100, 2), \"%\")\n", " print(\"Recall:\", round(recall * 100, 2), \"%\")\n", " print(\"F1 Score:\", round(f1 * 100, 2), \"%\")\n", " print()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "Ccls_K3PXz3M", "outputId": "d88d000f-1356-47f1-cd79-088fc5b62f1a" }, "execution_count": 57, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Classifier: Logistic Regression\n", "Accuracy: 98.3 %\n", "Precision: 98.34 %\n", "Recall: 98.3 %\n", "F1 Score: 98.27 %\n", "\n", "Classifier: K Nearest Neighbors\n", "Accuracy: 99.1 %\n", "Precision: 99.11 %\n", "Recall: 99.1 %\n", "F1 Score: 99.09 %\n", "\n", "Classifier: Decision Tree\n", "Accuracy: 100.0 %\n", "Precision: 100.0 %\n", "Recall: 100.0 %\n", "F1 Score: 100.0 %\n", "\n", "Classifier: Random Forest\n", "Accuracy: 99.9 %\n", "Precision: 99.9 %\n", "Recall: 99.9 %\n", "F1 Score: 99.9 %\n", "\n", "Classifier: Gradient Boosting\n", "Accuracy: 100.0 %\n", "Precision: 100.0 %\n", "Recall: 100.0 %\n", "F1 Score: 100.0 %\n", "\n", "Classifier: Support Vector Classification\n", "Accuracy: 98.3 %\n", "Precision: 98.34 %\n", "Recall: 98.3 %\n", "F1 Score: 98.27 %\n", "\n", "Classifier: AdaBoost\n", "Accuracy: 100.0 %\n", "Precision: 100.0 %\n", "Recall: 100.0 %\n", "F1 Score: 100.0 %\n", "\n", "Classifier: Naive Bayes\n", "Accuracy: 50.0 %\n", "Precision: 82.94 %\n", "Recall: 50.0 %\n", "F1 Score: 51.96 %\n", "\n" ] } ] }, { "cell_type": "code", "source": [ "# Let's continue DecisionTreeRegressor -> Random Search\n", "\n", "from sklearn.model_selection import RandomizedSearchCV\n", "from sklearn.tree import DecisionTreeRegressor\n", "\n", "param_grid = {\n", " 'max_depth': [None, 10, 20, 30, 40],\n", " 'min_samples_split': [2, 5, 10],\n", " 'min_samples_leaf': [1, 2, 4]\n", "}\n", "\n", "dt_regressor = DecisionTreeRegressor()\n", "\n", "dt_random = RandomizedSearchCV(estimator=dt_regressor, param_distributions=param_grid,\n", " n_iter=100, cv=3, verbose=2, random_state=42, n_jobs=-1)\n", "\n", "dt_random.fit(X_train, y_train)\n", "\n", "print(\"Best parameters found for Decision Tree Regression:\")\n", "print(dt_random.best_params_)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "WhXDVNgbbaUU", "outputId": "b8d2e13f-bb70-4a2f-a2c6-8bd132c29f49" }, "execution_count": 28, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Fitting 3 folds for each of 45 candidates, totalling 135 fits\n" ] }, { "output_type": "stream", "name": "stderr", "text": [ "/usr/local/lib/python3.10/dist-packages/sklearn/model_selection/_search.py:305: UserWarning: The total space of parameters 45 is smaller than n_iter=100. Running 45 iterations. For exhaustive searches, use GridSearchCV.\n", " warnings.warn(\n" ] }, { "output_type": "stream", "name": "stdout", "text": [ "Best parameters found for Decision Tree Regression:\n", "{'min_samples_split': 2, 'min_samples_leaf': 1, 'max_depth': None}\n" ] } ] }, { "cell_type": "markdown", "source": [ "## Decision Tree Classification is the Ideal Model for this Task\n", "If you're looking for a suitable model for this task, Decision Tree Classification stands out as the optimal choice." ], "metadata": { "id": "BMNhTnz4iIIa" } }, { "cell_type": "code", "source": [ "import pickle\n", "from sklearn.tree import DecisionTreeClassifier\n", "\n", "dt = DecisionTreeClassifier(min_samples_split=2, min_samples_leaf=1, max_depth=None, random_state=42)\n", "\n", "dt.fit(X_train, y_train)\n", "\n", "# Let's save the model as a pickle file\n", "with open('dt_model.pkl', 'wb') as file:\n", " pickle.dump(dt, file)\n", "\n", "print(\"Model saved as dt_model.pkl\")" ], "metadata": { "id": "gIeL2M68qIMC", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "d3f78a98-167f-4798-e9b0-5fcb7dcb9fdc" }, "execution_count": 86, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Model saved as dt_model.pkl\n" ] } ] }, { "cell_type": "code", "source": [ "X_train.columns" ], "metadata": { "id": "OThjl6Llw--v", "outputId": "3c88b22b-8f3d-4f69-ffc2-d278e3187923", "colab": { "base_uri": "https://localhost:8080/" } }, "execution_count": 88, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "Index(['Transaction_Amount', 'Amount_paid', 'Vehicle_Speed', 'Percent_paid',\n", " 'Vehicle_Type_Car', 'Vehicle_Type_Motorcycle', 'Vehicle_Type_SUV',\n", " 'Vehicle_Type_Sedan', 'Vehicle_Type_Truck', 'Vehicle_Type_Van',\n", " 'Lane_Type_Regular', 'TollBoothID_B-102', 'TollBoothID_C-103',\n", " 'TollBoothID_D-104', 'TollBoothID_D-105', 'TollBoothID_D-106',\n", " 'Vehicle_Dimensions_Medium', 'Vehicle_Dimensions_Small',\n", " 'Vehicle_Speed_group_21-41', 'Vehicle_Speed_group_41-93',\n", " 'Vehicle_Speed_group_93-103', 'Vehicle_Speed_group_<= 21',\n", " 'Transaction_Amount_group_330 <', 'Transaction_Amount_group_60-180',\n", " 'Transaction_Amount_group_<= 60'],\n", " dtype='object')" ] }, "metadata": {}, "execution_count": 88 } ] }, { "cell_type": "code", "source": [ "# Let's look at input values\n", "X_train.columns" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "IEuk2z6ITgtr", "outputId": "e0680821-1a8c-40cc-c53c-9e7f52d30db2" }, "execution_count": 59, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "Index(['Transaction_Amount', 'Amount_paid', 'Vehicle_Speed', 'Percent_paid',\n", " 'Vehicle_Type_Car', 'Vehicle_Type_Motorcycle', 'Vehicle_Type_SUV',\n", " 'Vehicle_Type_Sedan', 'Vehicle_Type_Truck', 'Vehicle_Type_Van',\n", " 'Lane_Type_Regular', 'TollBoothID_B-102', 'TollBoothID_C-103',\n", " 'TollBoothID_D-104', 'TollBoothID_D-105', 'TollBoothID_D-106',\n", " 'Vehicle_Dimensions_Medium', 'Vehicle_Dimensions_Small',\n", " 'Vehicle_Speed_group_21-41', 'Vehicle_Speed_group_41-93',\n", " 'Vehicle_Speed_group_93-103', 'Vehicle_Speed_group_<= 21',\n", " 'Transaction_Amount_group_330 <', 'Transaction_Amount_group_60-180',\n", " 'Transaction_Amount_group_<= 60'],\n", " dtype='object')" ] }, "metadata": {}, "execution_count": 59 } ] }, { "cell_type": "markdown", "source": [ "# Neural Network Model Development" ], "metadata": { "id": "N2VZaKBBlPtD" } }, { "cell_type": "code", "source": [ "# Let's copy dataset\n", "nn_data = df.copy()" ], "metadata": { "id": "pWJy3NNOlJ48" }, "execution_count": 31, "outputs": [] }, { "cell_type": "code", "source": [ "# Let's define target and input value\n", "X = ml_data.drop('Fraud_indicator', axis =1)\n", "y = ml_data['Fraud_indicator']" ], "metadata": { "id": "gxeu3DxVd_4g" }, "execution_count": 33, "outputs": [] }, { "cell_type": "code", "source": [ "# Let's divide dataset to train and test data\n", "from sklearn.model_selection import train_test_split\n", "\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)" ], "metadata": { "id": "NWbSOOLkd_yo" }, "execution_count": 34, "outputs": [] }, { "cell_type": "code", "source": [ "from sklearn.model_selection import train_test_split\n", "from sklearn.preprocessing import StandardScaler\n", "from sklearn.metrics import classification_report, accuracy_score, precision_score, recall_score, f1_score\n", "from tensorflow.keras.models import Sequential\n", "from tensorflow.keras.layers import Dense, Dropout\n", "from tensorflow.keras.callbacks import EarlyStopping\n", "\n", "\n", "# Standardize the features\n", "scaler = StandardScaler()\n", "X_train_scaled = scaler.fit_transform(X_train)\n", "X_test_scaled = scaler.transform(X_test)\n", "\n", "# Build the neural network model for classification\n", "model = Sequential([\n", " Dense(64, activation='relu', input_shape=(X_train_scaled.shape[1],)),\n", " Dropout(0.2),\n", " Dense(32, activation='relu'),\n", " Dropout(0.2),\n", " Dense(1, activation='sigmoid') # Output layer for binary classification\n", "])\n", "\n", "# Compile the model\n", "model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])\n", "\n", "# Define early stopping to prevent overfitting\n", "early_stopping = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)\n", "\n", "# Train the model\n", "history = model.fit(X_train_scaled, y_train, epochs=10, batch_size=32, validation_split=0.2, callbacks=[early_stopping])\n", "\n", "# Predict on the test set\n", "y_pred_prob = model.predict(X_test_scaled)\n", "y_pred = (y_pred_prob > 0.5).astype(int) # Convert probabilities to binary predictions\n", "\n", "# Evaluate the model using classification metrics\n", "accuracy = accuracy_score(y_test, y_pred)\n", "precision = precision_score(y_test, y_pred)\n", "recall = recall_score(y_test, y_pred)\n", "f1 = f1_score(y_test, y_pred)\n", "\n", "print(\"Accuracy:\", round(accuracy * 100, 2), \"%\")\n", "print(\"Precision:\", round(precision * 100, 2), \"%\")\n", "print(\"Recall:\", round(recall * 100, 2), \"%\")\n", "print(\"F1 Score:\", round(f1 * 100, 2), \"%\")\n", "\n", "# Print detailed classification report\n", "print(\"\\nClassification Report:\\n\", classification_report(y_test, y_pred))" ], "metadata": { "id": "gHe_ALIxiOGl", "colab": { "base_uri": "https://localhost:8080/" }, "outputId": "20791004-0ff0-456c-d2a6-7bb25de4f84c" }, "execution_count": 36, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Epoch 1/10\n", "100/100 [==============================] - 1s 4ms/step - loss: 0.4956 - accuracy: 0.7534 - val_loss: 0.3198 - val_accuracy: 0.8625\n", "Epoch 2/10\n", "100/100 [==============================] - 0s 2ms/step - loss: 0.2969 - accuracy: 0.8737 - val_loss: 0.2005 - val_accuracy: 0.9225\n", "Epoch 3/10\n", "100/100 [==============================] - 0s 3ms/step - loss: 0.1929 - accuracy: 0.9394 - val_loss: 0.1197 - val_accuracy: 0.9688\n", "Epoch 4/10\n", "100/100 [==============================] - 0s 2ms/step - loss: 0.1404 - accuracy: 0.9591 - val_loss: 0.0882 - val_accuracy: 0.9750\n", "Epoch 5/10\n", "100/100 [==============================] - 0s 3ms/step - loss: 0.1188 - accuracy: 0.9675 - val_loss: 0.0674 - val_accuracy: 0.9787\n", "Epoch 6/10\n", "100/100 [==============================] - 0s 2ms/step - loss: 0.1036 - accuracy: 0.9737 - val_loss: 0.0511 - val_accuracy: 0.9887\n", "Epoch 7/10\n", "100/100 [==============================] - 0s 2ms/step - loss: 0.0865 - accuracy: 0.9806 - val_loss: 0.0468 - val_accuracy: 0.9862\n", "Epoch 8/10\n", "100/100 [==============================] - 0s 2ms/step - loss: 0.0751 - accuracy: 0.9828 - val_loss: 0.0472 - val_accuracy: 0.9887\n", "Epoch 9/10\n", "100/100 [==============================] - 0s 3ms/step - loss: 0.0675 - accuracy: 0.9844 - val_loss: 0.0351 - val_accuracy: 0.9887\n", "Epoch 10/10\n", "100/100 [==============================] - 0s 2ms/step - loss: 0.0609 - accuracy: 0.9894 - val_loss: 0.0337 - val_accuracy: 0.9900\n", "32/32 [==============================] - 0s 2ms/step\n", "Accuracy: 98.5 %\n", "Precision: 100.0 %\n", "Recall: 93.09 %\n", "F1 Score: 96.42 %\n", "\n", "Classification Report:\n", " precision recall f1-score support\n", "\n", " 0 0.98 1.00 0.99 783\n", " 1 1.00 0.93 0.96 217\n", "\n", " accuracy 0.98 1000\n", " macro avg 0.99 0.97 0.98 1000\n", "weighted avg 0.99 0.98 0.98 1000\n", "\n" ] } ] }, { "cell_type": "code", "source": [ "# Optionally, visualize training history\n", "import matplotlib.pyplot as plt\n", "\n", "plt.plot(history.history['loss'], label='Training Loss')\n", "plt.plot(history.history['val_loss'], label='Validation Loss')\n", "plt.xlabel('Epoch')\n", "plt.ylabel('Loss')\n", "plt.legend()\n", "plt.show()" ], "metadata": { "id": "sXVGr03siOAi", "colab": { "base_uri": "https://localhost:8080/", "height": 449 }, "outputId": "06544a56-8376-4b8c-c46f-56664d18b0c5" }, "execution_count": 37, "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "
" ], "image/png": "\n" }, "metadata": {} } ] }, { "cell_type": "markdown", "source": [ "# Model Evaluation" ], "metadata": { "id": "cjKuqBQImuXq" } }, { "cell_type": "code", "source": [ "from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report\n", "\n", "# Let's predict for Decision Tree Classification\n", "dt_y_pred = dt.predict(X_test)\n", "\n", "# Let's calculate evaluation metrics for classification\n", "dt_accuracy = accuracy_score(y_test, dt_y_pred)\n", "dt_precision = precision_score(y_test, dt_y_pred, average='weighted')\n", "dt_recall = recall_score(y_test, dt_y_pred, average='weighted')\n", "dt_f1 = f1_score(y_test, dt_y_pred, average='weighted')\n", "\n", "print(\"Decision Tree Classification Evaluation Metrics:\")\n", "print(\"Accuracy:\", round(dt_accuracy * 100, 2), \"%\")\n", "print(\"Precision:\", round(dt_precision * 100, 2), \"%\")\n", "print(\"Recall:\", round(dt_recall * 100, 2), \"%\")\n", "print(\"F1 Score:\", round(dt_f1 * 100, 2), \"%\")\n", "\n", "# Print detailed classification report\n", "print(\"\\nClassification Report:\\n\", classification_report(y_test, dt_y_pred))" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "S8lDViZvmvnS", "outputId": "fce84480-d8aa-416f-9b07-16e81c7e7304" }, "execution_count": 60, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Decision Tree Classification Evaluation Metrics:\n", "Accuracy: 100.0 %\n", "Precision: 100.0 %\n", "Recall: 100.0 %\n", "F1 Score: 100.0 %\n", "\n", "Classification Report:\n", " precision recall f1-score support\n", "\n", " 0 1.00 1.00 1.00 783\n", " 1 1.00 1.00 1.00 217\n", "\n", " accuracy 1.00 1000\n", " macro avg 1.00 1.00 1.00 1000\n", "weighted avg 1.00 1.00 1.00 1000\n", "\n" ] } ] }, { "cell_type": "markdown", "source": [ "# ML Pipelines and Model Deployment" ], "metadata": { "id": "HhbrXLr-oaQS" } }, { "cell_type": "code", "source": [ "import pandas as pd\n", "import pickle\n", "\n", "# Load the trained Decision Tree Classification model\n", "with open('dt_model.pkl', 'rb') as file:\n", " dt_model = pickle.load(file)\n", "\n", "# Define preprocessing function\n", "def preprocess_input(data):\n", " df = pd.DataFrame(data, index=[0])\n", "\n", " # Let's find which percent are paid\n", " df['Percent_paid'] = df.apply(lambda row: row['Amount_paid'] / row['Transaction_Amount'] if row['Transaction_Amount'] != 0 else 0, axis=1)\n", "\n", " # Let's also group Vehicle_Speed, Transaction_Amount columns\n", " def Vehicle_Speed_group(speed):\n", " if speed <= 21:\n", " return '<= 21'\n", " elif speed <= 41:\n", " return '21-41'\n", " elif speed <= 93:\n", " return '41-93'\n", " elif speed <= 103:\n", " return '93-103'\n", " else:\n", " return '103 <'\n", "\n", " def Transaction_Amount_group(amount):\n", " if amount <= 60:\n", " return '<= 60'\n", " elif amount <= 180:\n", " return '60-180'\n", " elif amount <= 330:\n", " return '180-330'\n", " else:\n", " return '330 <'\n", "\n", " df['Vehicle_Speed_group'] = df['Vehicle_Speed'].apply(lambda x: Vehicle_Speed_group(x))\n", " df['Transaction_Amount_group'] = df['Transaction_Amount'].apply(lambda x: Transaction_Amount_group(x))\n", "\n", " # Let's encode categorical features\n", " df = pd.get_dummies(df, columns=['Vehicle_Type', 'Lane_Type','TollBoothID', 'Vehicle_Dimensions','Vehicle_Speed_group','Transaction_Amount_group'], drop_first=True)\n", "\n", " # Let's first remove unnecessary columns\n", " df.drop(columns = ['Transaction_ID','Timestamp','FastagID','Geographical_Location','Vehicle_Plate_Number'], inplace = True)\n", "\n", " # Let's convert all bool to int\n", " for col in df.select_dtypes(include=['bool', 'object']):\n", " try:\n", " df[col] = df[col].astype(int)\n", " except ValueError:\n", " df[col] = df[col].astype(float).round().astype(int)\n", "\n", " columns_to_add = ['Transaction_Amount', 'Amount_paid', 'Vehicle_Speed', 'Percent_paid',\n", " 'Vehicle_Type_Car', 'Vehicle_Type_Motorcycle', 'Vehicle_Type_SUV',\n", " 'Vehicle_Type_Sedan', 'Vehicle_Type_Truck', 'Vehicle_Type_Van',\n", " 'Lane_Type_Regular', 'TollBoothID_B-102', 'TollBoothID_C-103',\n", " 'TollBoothID_D-104', 'TollBoothID_D-105', 'TollBoothID_D-106',\n", " 'Vehicle_Dimensions_Medium', 'Vehicle_Dimensions_Small',\n", " 'Vehicle_Speed_group_21-41', 'Vehicle_Speed_group_41-93',\n", " 'Vehicle_Speed_group_93-103', 'Vehicle_Speed_group_<= 21',\n", " 'Transaction_Amount_group_330 <', 'Transaction_Amount_group_60-180',\n", " 'Transaction_Amount_group_<= 60']\n", "\n", " # Add missing columns with default value of 0\n", " for column in columns_to_add:\n", " if column not in df.columns:\n", " df[column] = 0\n", "\n", " return df\n", "\n", "# Define function to detect fraud\n", "def predict_salary(transaction_id, timestamp, vehicle_type, fastag_id, tollbooth_id, lane_type, vehicle_dimensions, transaction_amount, amount_paid, geographical_location, vehicle_speed, vehicle_plate_number):\n", " # Preprocess input data\n", " input_data = preprocess_input({\n", " \"Transaction_ID\": transaction_id,\n", " \"Timestamp\": timestamp,\n", " \"Vehicle_Type\": vehicle_type,\n", " \"FastagID\": fastag_id,\n", " \"TollBoothID\": tollbooth_id,\n", " \"Lane_Type\": lane_type,\n", " \"Vehicle_Dimensions\": vehicle_dimensions,\n", " \"Transaction_Amount\": transaction_amount,\n", " \"Amount_paid\": amount_paid,\n", " \"Geographical_Location\": geographical_location,\n", " \"Vehicle_Speed\": vehicle_speed,\n", " \"Vehicle_Plate_Number\": vehicle_plate_number\n", " })\n", "\n", " input_data = input_data[['Transaction_Amount', 'Amount_paid', 'Vehicle_Speed', 'Percent_paid',\n", " 'Vehicle_Type_Car', 'Vehicle_Type_Motorcycle', 'Vehicle_Type_SUV',\n", " 'Vehicle_Type_Sedan', 'Vehicle_Type_Truck', 'Vehicle_Type_Van',\n", " 'Lane_Type_Regular', 'TollBoothID_B-102', 'TollBoothID_C-103',\n", " 'TollBoothID_D-104', 'TollBoothID_D-105', 'TollBoothID_D-106',\n", " 'Vehicle_Dimensions_Medium', 'Vehicle_Dimensions_Small',\n", " 'Vehicle_Speed_group_21-41', 'Vehicle_Speed_group_41-93',\n", " 'Vehicle_Speed_group_93-103', 'Vehicle_Speed_group_<= 21',\n", " 'Transaction_Amount_group_330 <', 'Transaction_Amount_group_60-180',\n", " 'Transaction_Amount_group_<= 60']]\n", "\n", " # Predict Fraud using the trained model\n", " fraud_prediction = dt_model.predict(input_data)\n", " fraud_prediction = ['Fraud❌' if fraud_prediction[0] == 1 else 'Not Fraud✅']\n", " result = f'Transaction ID {transaction_id}, the predicted fraud based on your details is {\"Fraud\" if fraud_prediction[0] == 1 else \"Not Fraud\"}'\n", " return result" ], "metadata": { "id": "9FC9svFciN7f" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "Deploy (Gradio)" ], "metadata": { "id": "rvUmlz9aFhJE" } }, { "cell_type": "code", "source": [ "!pip install gradio" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "s9dxQCdns-8Z", "outputId": "ab4361d1-44da-43c0-8121-0d572c50ad99" }, "execution_count": 43, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Collecting gradio\n", " Downloading gradio-4.32.1-py3-none-any.whl (12.3 MB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m12.3/12.3 MB\u001b[0m \u001b[31m24.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hCollecting aiofiles<24.0,>=22.0 (from gradio)\n", " Downloading aiofiles-23.2.1-py3-none-any.whl (15 kB)\n", "Requirement already satisfied: altair<6.0,>=4.2.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (4.2.2)\n", "Collecting fastapi (from gradio)\n", " Downloading fastapi-0.111.0-py3-none-any.whl (91 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m92.0/92.0 kB\u001b[0m \u001b[31m9.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hCollecting ffmpy (from gradio)\n", " Downloading ffmpy-0.3.2.tar.gz (5.5 kB)\n", " Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", "Collecting gradio-client==0.17.0 (from gradio)\n", " Downloading gradio_client-0.17.0-py3-none-any.whl (316 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m316.3/316.3 kB\u001b[0m \u001b[31m29.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hCollecting httpx>=0.24.1 (from gradio)\n", " Downloading httpx-0.27.0-py3-none-any.whl (75 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m75.6/75.6 kB\u001b[0m \u001b[31m10.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hRequirement already satisfied: huggingface-hub>=0.19.3 in /usr/local/lib/python3.10/dist-packages (from gradio) (0.23.1)\n", "Requirement already satisfied: importlib-resources<7.0,>=1.3 in /usr/local/lib/python3.10/dist-packages (from gradio) (6.4.0)\n", "Requirement already satisfied: jinja2<4.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (3.1.4)\n", "Requirement already satisfied: markupsafe~=2.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (2.1.5)\n", "Requirement already satisfied: matplotlib~=3.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (3.7.1)\n", "Requirement already satisfied: numpy~=1.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (1.25.2)\n", "Collecting orjson~=3.0 (from gradio)\n", " Downloading orjson-3.10.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (142 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m142.5/142.5 kB\u001b[0m \u001b[31m16.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hRequirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from gradio) (24.0)\n", "Requirement already satisfied: pandas<3.0,>=1.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (2.0.3)\n", "Requirement already satisfied: pillow<11.0,>=8.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (9.4.0)\n", "Requirement already satisfied: pydantic>=2.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (2.7.1)\n", "Collecting pydub (from gradio)\n", " Downloading pydub-0.25.1-py2.py3-none-any.whl (32 kB)\n", "Collecting python-multipart>=0.0.9 (from gradio)\n", " Downloading python_multipart-0.0.9-py3-none-any.whl (22 kB)\n", "Requirement already satisfied: pyyaml<7.0,>=5.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (6.0.1)\n", "Collecting ruff>=0.2.2 (from gradio)\n", " Downloading ruff-0.4.6-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (8.8 MB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m8.8/8.8 MB\u001b[0m \u001b[31m77.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hCollecting semantic-version~=2.0 (from gradio)\n", " Downloading semantic_version-2.10.0-py2.py3-none-any.whl (15 kB)\n", "Collecting tomlkit==0.12.0 (from gradio)\n", " Downloading tomlkit-0.12.0-py3-none-any.whl (37 kB)\n", "Collecting typer<1.0,>=0.12 (from gradio)\n", " Downloading typer-0.12.3-py3-none-any.whl (47 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m47.2/47.2 kB\u001b[0m \u001b[31m5.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hRequirement already satisfied: typing-extensions~=4.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (4.11.0)\n", "Requirement already satisfied: urllib3~=2.0 in /usr/local/lib/python3.10/dist-packages (from gradio) (2.0.7)\n", "Collecting uvicorn>=0.14.0 (from gradio)\n", " Downloading uvicorn-0.30.0-py3-none-any.whl (62 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m62.4/62.4 kB\u001b[0m \u001b[31m7.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hRequirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from gradio-client==0.17.0->gradio) (2023.6.0)\n", "Collecting websockets<12.0,>=10.0 (from gradio-client==0.17.0->gradio)\n", " Downloading websockets-11.0.3-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (129 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m129.9/129.9 kB\u001b[0m \u001b[31m14.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hRequirement already satisfied: entrypoints in /usr/local/lib/python3.10/dist-packages (from altair<6.0,>=4.2.0->gradio) (0.4)\n", "Requirement already satisfied: jsonschema>=3.0 in /usr/local/lib/python3.10/dist-packages (from altair<6.0,>=4.2.0->gradio) (4.19.2)\n", "Requirement already satisfied: toolz in /usr/local/lib/python3.10/dist-packages (from altair<6.0,>=4.2.0->gradio) (0.12.1)\n", "Requirement already satisfied: anyio in /usr/local/lib/python3.10/dist-packages (from httpx>=0.24.1->gradio) (3.7.1)\n", "Requirement already satisfied: certifi in /usr/local/lib/python3.10/dist-packages (from httpx>=0.24.1->gradio) (2024.2.2)\n", "Collecting httpcore==1.* (from httpx>=0.24.1->gradio)\n", " Downloading httpcore-1.0.5-py3-none-any.whl (77 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m77.9/77.9 kB\u001b[0m \u001b[31m9.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hRequirement already satisfied: idna in /usr/local/lib/python3.10/dist-packages (from httpx>=0.24.1->gradio) (3.7)\n", "Requirement already satisfied: sniffio in /usr/local/lib/python3.10/dist-packages (from httpx>=0.24.1->gradio) (1.3.1)\n", "Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx>=0.24.1->gradio)\n", " Downloading h11-0.14.0-py3-none-any.whl (58 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m58.3/58.3 kB\u001b[0m \u001b[31m6.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hRequirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.19.3->gradio) (3.14.0)\n", "Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.19.3->gradio) (2.31.0)\n", "Requirement already satisfied: tqdm>=4.42.1 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub>=0.19.3->gradio) (4.66.4)\n", "Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib~=3.0->gradio) (1.2.1)\n", "Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib~=3.0->gradio) (0.12.1)\n", "Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib~=3.0->gradio) (4.51.0)\n", "Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib~=3.0->gradio) (1.4.5)\n", "Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib~=3.0->gradio) (3.1.2)\n", "Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.10/dist-packages (from matplotlib~=3.0->gradio) (2.8.2)\n", "Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas<3.0,>=1.0->gradio) (2023.4)\n", "Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from pandas<3.0,>=1.0->gradio) (2024.1)\n", "Requirement already satisfied: annotated-types>=0.4.0 in /usr/local/lib/python3.10/dist-packages (from pydantic>=2.0->gradio) (0.7.0)\n", "Requirement already satisfied: pydantic-core==2.18.2 in /usr/local/lib/python3.10/dist-packages (from pydantic>=2.0->gradio) (2.18.2)\n", "Requirement already satisfied: click>=8.0.0 in /usr/local/lib/python3.10/dist-packages (from typer<1.0,>=0.12->gradio) (8.1.7)\n", "Collecting shellingham>=1.3.0 (from typer<1.0,>=0.12->gradio)\n", " Downloading shellingham-1.5.4-py2.py3-none-any.whl (9.8 kB)\n", "Requirement already satisfied: rich>=10.11.0 in /usr/local/lib/python3.10/dist-packages (from typer<1.0,>=0.12->gradio) (13.7.1)\n", "Collecting starlette<0.38.0,>=0.37.2 (from fastapi->gradio)\n", " Downloading starlette-0.37.2-py3-none-any.whl (71 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m71.9/71.9 kB\u001b[0m \u001b[31m7.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hCollecting fastapi-cli>=0.0.2 (from fastapi->gradio)\n", " Downloading fastapi_cli-0.0.4-py3-none-any.whl (9.5 kB)\n", "Collecting ujson!=4.0.2,!=4.1.0,!=4.2.0,!=4.3.0,!=5.0.0,!=5.1.0,>=4.0.1 (from fastapi->gradio)\n", " Downloading ujson-5.10.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (53 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m53.6/53.6 kB\u001b[0m \u001b[31m6.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hCollecting email_validator>=2.0.0 (from fastapi->gradio)\n", " Downloading email_validator-2.1.1-py3-none-any.whl (30 kB)\n", "Collecting dnspython>=2.0.0 (from email_validator>=2.0.0->fastapi->gradio)\n", " Downloading dnspython-2.6.1-py3-none-any.whl (307 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m307.7/307.7 kB\u001b[0m \u001b[31m25.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hRequirement already satisfied: attrs>=22.2.0 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=3.0->altair<6.0,>=4.2.0->gradio) (23.2.0)\n", "Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=3.0->altair<6.0,>=4.2.0->gradio) (2023.12.1)\n", "Requirement already satisfied: referencing>=0.28.4 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=3.0->altair<6.0,>=4.2.0->gradio) (0.35.1)\n", "Requirement already satisfied: rpds-py>=0.7.1 in /usr/local/lib/python3.10/dist-packages (from jsonschema>=3.0->altair<6.0,>=4.2.0->gradio) (0.18.1)\n", "Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.7->matplotlib~=3.0->gradio) (1.16.0)\n", "Requirement already satisfied: markdown-it-py>=2.2.0 in /usr/local/lib/python3.10/dist-packages (from rich>=10.11.0->typer<1.0,>=0.12->gradio) (3.0.0)\n", "Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /usr/local/lib/python3.10/dist-packages (from rich>=10.11.0->typer<1.0,>=0.12->gradio) (2.16.1)\n", "Requirement already satisfied: exceptiongroup in /usr/local/lib/python3.10/dist-packages (from anyio->httpx>=0.24.1->gradio) (1.2.1)\n", "Collecting httptools>=0.5.0 (from uvicorn>=0.14.0->gradio)\n", " Downloading httptools-0.6.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (341 kB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m341.4/341.4 kB\u001b[0m \u001b[31m28.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hCollecting python-dotenv>=0.13 (from uvicorn>=0.14.0->gradio)\n", " Downloading python_dotenv-1.0.1-py3-none-any.whl (19 kB)\n", "Collecting uvloop!=0.15.0,!=0.15.1,>=0.14.0 (from uvicorn>=0.14.0->gradio)\n", " Downloading uvloop-0.19.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.4 MB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m3.4/3.4 MB\u001b[0m \u001b[31m58.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hCollecting watchfiles>=0.13 (from uvicorn>=0.14.0->gradio)\n", " Downloading watchfiles-0.22.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)\n", "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.2/1.2 MB\u001b[0m \u001b[31m46.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", "\u001b[?25hRequirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->huggingface-hub>=0.19.3->gradio) (3.3.2)\n", "Requirement already satisfied: mdurl~=0.1 in /usr/local/lib/python3.10/dist-packages (from markdown-it-py>=2.2.0->rich>=10.11.0->typer<1.0,>=0.12->gradio) (0.1.2)\n", "Building wheels for collected packages: ffmpy\n", " Building wheel for ffmpy (setup.py) ... \u001b[?25l\u001b[?25hdone\n", " Created wheel for ffmpy: filename=ffmpy-0.3.2-py3-none-any.whl size=5584 sha256=b42c86124f3267644b6ac43645d02865a9d0eccdcd60de9bd4e32544d227d44b\n", " Stored in directory: /root/.cache/pip/wheels/bd/65/9a/671fc6dcde07d4418df0c592f8df512b26d7a0029c2a23dd81\n", "Successfully built ffmpy\n", "Installing collected packages: pydub, ffmpy, websockets, uvloop, ujson, tomlkit, shellingham, semantic-version, ruff, python-multipart, python-dotenv, orjson, httptools, h11, dnspython, aiofiles, watchfiles, uvicorn, starlette, httpcore, email_validator, typer, httpx, gradio-client, fastapi-cli, fastapi, gradio\n", " Attempting uninstall: typer\n", " Found existing installation: typer 0.9.4\n", " Uninstalling typer-0.9.4:\n", " Successfully uninstalled typer-0.9.4\n", "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n", "spacy 3.7.4 requires typer<0.10.0,>=0.3.0, but you have typer 0.12.3 which is incompatible.\n", "weasel 0.3.4 requires typer<0.10.0,>=0.3.0, but you have typer 0.12.3 which is incompatible.\u001b[0m\u001b[31m\n", "\u001b[0mSuccessfully installed aiofiles-23.2.1 dnspython-2.6.1 email_validator-2.1.1 fastapi-0.111.0 fastapi-cli-0.0.4 ffmpy-0.3.2 gradio-4.32.1 gradio-client-0.17.0 h11-0.14.0 httpcore-1.0.5 httptools-0.6.1 httpx-0.27.0 orjson-3.10.3 pydub-0.25.1 python-dotenv-1.0.1 python-multipart-0.0.9 ruff-0.4.6 semantic-version-2.10.0 shellingham-1.5.4 starlette-0.37.2 tomlkit-0.12.0 typer-0.12.3 ujson-5.10.0 uvicorn-0.30.0 uvloop-0.19.0 watchfiles-0.22.0 websockets-11.0.3\n" ] } ] }, { "cell_type": "code", "source": [ "import pandas as pd\n", "import pickle\n", "import gradio as gr\n", "\n", "# Define preprocessing function\n", "def preprocess_input(data):\n", " df = pd.DataFrame(data, index=[0])\n", "\n", " # Let's find which percent are paid\n", " df['Percent_paid'] = df.apply(lambda row: row['Amount_paid'] / row['Transaction_Amount'] if row['Transaction_Amount'] != 0 else 0, axis=1)\n", "\n", " # Let's also group Vehicle_Speed, Transaction_Amount columns\n", " def Vehicle_Speed_group(speed):\n", " if speed <= 21:\n", " return '<= 21'\n", " elif speed <= 41:\n", " return '21-41'\n", " elif speed <= 93:\n", " return '41-93'\n", " elif speed <= 103:\n", " return '93-103'\n", " else:\n", " return '103 <'\n", "\n", " def Transaction_Amount_group(amount):\n", " if amount <= 60:\n", " return '<= 60'\n", " elif amount <= 180:\n", " return '60-180'\n", " elif amount <= 330:\n", " return '180-330'\n", " else:\n", " return '330 <'\n", "\n", " df['Vehicle_Speed_group'] = df['Vehicle_Speed'].apply(lambda x: Vehicle_Speed_group(x))\n", " df['Transaction_Amount_group'] = df['Transaction_Amount'].apply(lambda x: Transaction_Amount_group(x))\n", "\n", " # Let's encode categorical features\n", " df = pd.get_dummies(df, columns=['Vehicle_Type', 'Lane_Type','TollBoothID', 'Vehicle_Dimensions','Vehicle_Speed_group','Transaction_Amount_group'], drop_first=True)\n", "\n", " # Let's first remove unnecessary columns\n", " df.drop(columns = ['Transaction_ID','Timestamp','FastagID','Geographical_Location','Vehicle_Plate_Number'], inplace = True)\n", "\n", " # Let's convert all bool to int\n", " for col in df.select_dtypes(include=['bool', 'object']):\n", " try:\n", " df[col] = df[col].astype(int)\n", " except ValueError:\n", " df[col] = df[col].astype(float).round().astype(int)\n", "\n", " columns_to_add = ['Transaction_Amount', 'Amount_paid', 'Vehicle_Speed', 'Percent_paid',\n", " 'Vehicle_Type_Car', 'Vehicle_Type_Motorcycle', 'Vehicle_Type_SUV',\n", " 'Vehicle_Type_Sedan', 'Vehicle_Type_Truck', 'Vehicle_Type_Van',\n", " 'Lane_Type_Regular', 'TollBoothID_B-102', 'TollBoothID_C-103',\n", " 'TollBoothID_D-104', 'TollBoothID_D-105', 'TollBoothID_D-106',\n", " 'Vehicle_Dimensions_Medium', 'Vehicle_Dimensions_Small',\n", " 'Vehicle_Speed_group_21-41', 'Vehicle_Speed_group_41-93',\n", " 'Vehicle_Speed_group_93-103', 'Vehicle_Speed_group_<= 21',\n", " 'Transaction_Amount_group_330 <', 'Transaction_Amount_group_60-180',\n", " 'Transaction_Amount_group_<= 60']\n", "\n", " # Add missing columns with default value of 0\n", " for column in columns_to_add:\n", " if column not in df.columns:\n", " df[column] = 0\n", "\n", " return df\n", "\n", "# Load the trained Decision tree classification model\n", "with open('dt_model.pkl', 'rb') as file:\n", " dt_model = pickle.load(file)\n", "\n", "# Define function to detect fraud\n", "def predict_salary(transaction_id, timestamp, vehicle_type, fastag_id, tollbooth_id, lane_type, vehicle_dimensions, transaction_amount, amount_paid, geographical_location, vehicle_speed, vehicle_plate_number):\n", " # Preprocess input data\n", " input_data = preprocess_input({\n", " \"Transaction_ID\": transaction_id,\n", " \"Timestamp\": timestamp,\n", " \"Vehicle_Type\": vehicle_type,\n", " \"FastagID\": fastag_id,\n", " \"TollBoothID\": tollbooth_id,\n", " \"Lane_Type\": lane_type,\n", " \"Vehicle_Dimensions\": vehicle_dimensions,\n", " \"Transaction_Amount\": transaction_amount,\n", " \"Amount_paid\": amount_paid,\n", " \"Geographical_Location\": geographical_location,\n", " \"Vehicle_Speed\": vehicle_speed,\n", " \"Vehicle_Plate_Number\": vehicle_plate_number\n", " })\n", "\n", " input_data = input_data[['Transaction_Amount', 'Amount_paid', 'Vehicle_Speed', 'Percent_paid',\n", " 'Vehicle_Type_Car', 'Vehicle_Type_Motorcycle', 'Vehicle_Type_SUV',\n", " 'Vehicle_Type_Sedan', 'Vehicle_Type_Truck', 'Vehicle_Type_Van',\n", " 'Lane_Type_Regular', 'TollBoothID_B-102', 'TollBoothID_C-103',\n", " 'TollBoothID_D-104', 'TollBoothID_D-105', 'TollBoothID_D-106',\n", " 'Vehicle_Dimensions_Medium', 'Vehicle_Dimensions_Small',\n", " 'Vehicle_Speed_group_21-41', 'Vehicle_Speed_group_41-93',\n", " 'Vehicle_Speed_group_93-103', 'Vehicle_Speed_group_<= 21',\n", " 'Transaction_Amount_group_330 <', 'Transaction_Amount_group_60-180',\n", " 'Transaction_Amount_group_<= 60']]\n", "\n", " # Predict Fraud using the trained model\n", " fraud_prediction = dt_model.predict(input_data)\n", " fraud_prediction = ['Fraud❌' if fraud_prediction[0] == 1 else 'Not Fraud✅']\n", " result = f'Transaction ID {transaction_id}, the predicted fraud based on your details is {\"Fraud\" if fraud_prediction[0] == 1 else \"Not Fraud\"}'\n", " return result\n", "\n", "\n", "# Define Gradio interface\n", "interface = gr.Interface(\n", " fn=predict_salary,\n", " inputs=[\n", " gr.Textbox(label=\"Transaction ID\"),\n", " gr.Textbox(label=\"Timestamp\"),\n", " gr.Dropdown(['Bus ','Car','Motorcycle','Truck','Van','Sedan','SUV'], label=\"Vehicle_Type\"),\n", " gr.Textbox(label=\"Fastag ID\"),\n", " gr.Dropdown(['A-101','B-102','D-104','C-103','D-105','D-106'], label=\"TollBoothID\"),\n", " gr.Dropdown(['Express','Regular'], label=\"Lane_Type\"),\n", " gr.Dropdown(['Large','Small','Medium'], label=\"Vehicle_Dimensions\"),\n", " gr.Number(label=\"Transaction Amount\"),\n", " gr.Number(label=\"Amount Paid\"),\n", " gr.Textbox(label=\"Geographical Location\"),\n", " gr.Number(label=\"Vehicle Speed\"),\n", " gr.Textbox(label=\"Vehicle Plate Number\")\n", " ],\n", " outputs=gr.Textbox(label=\"Predicted Salary\"),\n", " title=\"Fraud Detection Model\",\n", " description=\"Enter details to predict salary.\"\n", ")\n", "\n", "# Launch the Gradio interface\n", "interface.launch()" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 653 }, "id": "cbPDEtuCp9zj", "outputId": "15f0138e-e423-43b6-be32-537d7a8bc0c9" }, "execution_count": 89, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).\n", "\n", "Colab notebook detected. To show errors in colab notebook, set debug=True in launch()\n", "Running on public URL: https://abfad13b115a2f01a3.gradio.live\n", "\n", "This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)\n" ] }, { "output_type": "display_data", "data": { "text/plain": [ "" ], "text/html": [ "
" ] }, "metadata": {} }, { "output_type": "execute_result", "data": { "text/plain": [] }, "metadata": {}, "execution_count": 89 } ] }, { "cell_type": "markdown", "source": [ "# 🔚The End" ], "metadata": { "id": "QzaMh4h6F-Lc" } } ] }