{ "cells": [ { "cell_type": "markdown", "id": "7259c012", "metadata": {}, "source": [ "# TDCE Learning Model Basic Model Construction\n", "\n", "This notebook display the basic construction of Time Driven Cost Estimation Learning Model step by step without using the experiment script (experiment_script.py)" ] }, { "cell_type": "markdown", "id": "050178b5", "metadata": {}, "source": [ "Import the library" ] }, { "cell_type": "code", "execution_count": null, "id": "ae7385ba", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import importlib\n", "import sys\n", "import os\n", "import time\n", "import numpy as np\n", "import requests\n", "import ipywidgets as widgets\n", "from IPython.display import display\n", "import git" ] }, { "cell_type": "markdown", "id": "bf7b5cc7", "metadata": {}, "source": [ "If run from online source or Google Collab please run this code for clone the model file" ] }, { "cell_type": "code", "execution_count": null, "id": "4bb874d3", "metadata": {}, "outputs": [], "source": [ "run_from_online = True\n", "ignore_download_dataset = False\n", "\n", "\n", "if run_from_online:\n", " try:\n", " repo = git.Repo.clone_from('https://huggingface.co/iaecpsu-1/tdce-basic',\n", " './tdce-basic',\n", " branch='main')\n", " except git.exc.GitCommandError:\n", " print(\"Repository already exists or cannot be cloned. Continuing with existing files.\")\n", " pass\n", "\n", " # fmt:off\n", " sys.path.append('./tdce-basic/model')\n", " sys.path.append('./tdce-basic/functions/matrix_generator')\n", " sys.path.append('./tdce-basic/functions')\n", " sys.path.append('./tdce-basic/functions/data_extractor')\n", "\n", " try:\n", " # For online execution, in Google Colab\n", " !pip install tensor-sensor\n", " except:\n", " pass\n", " \n", " import tdce_model as tdce\n", " import material_fc_layer as mfl\n", " import employee_fc_layer as efl\n", " import capital_fc_layer as cfl\n", " import loss\n", " import cost_matrix_class as cmc\n", " import display_input_variation as diva\n", " import viyacrab_augmentation as viya\n", " import adjust_data as ajd\n", " import result_display as rd\n", " import mini_plot as mp\n", " # fmt:on\n", " \n", "\n", "else :\n", " # fmt:off\n", " sys.path.append('../model')\n", " sys.path.append('../functions/matrix_generator')\n", " sys.path.append('../functions')\n", " sys.path.append('../functions/data_extractor')\n", " \n", " import tdce_model as tdce\n", " import material_fc_layer as mfl\n", " import employee_fc_layer as efl\n", " import capital_fc_layer as cfl\n", " import loss\n", " import cost_matrix_class as cmc\n", " import display_input_variation as diva\n", " import viyacrab_augmentation as viya\n", " import adjust_data as ajd\n", " import result_display as rd\n", " import mini_plot as mp\n", " # fmt:on\n", "\n", "importlib.reload(tdce)\n", "importlib.reload(mfl)\n", "importlib.reload(efl)\n", "importlib.reload(cfl)\n", "importlib.reload(loss)\n", "importlib.reload(tdce)\n", "importlib.reload(cmc)\n", "importlib.reload(diva)\n", "importlib.reload(viya)\n", "importlib.reload(ajd)\n", "importlib.reload(rd)\n", "importlib.reload(mp)" ] }, { "cell_type": "markdown", "id": "6b7fd72c", "metadata": {}, "source": [ "## Dataset \n", "We will use our project dataset for experimental, we pick the [extended-random-dataset](https://huggingface.co/datasets/theethawats98/tdce-example-extended-random) which is the dataset with high dimension but moderate variation to use as case study for out demonstation. We create the dataset in folder `datasets` and then inside it have the folder `extended-random` again. We will create the folder if it is not exist and download the datafile from the our huggingface." ] }, { "cell_type": "code", "execution_count": null, "id": "aa31e7f6", "metadata": {}, "outputs": [], "source": [ "try:\n", " os.mkdir('result')\n", " os.mkdir(f\"datasets\")\n", " os.mkdir(f\"datasets/extended-random\")\n", "except FileExistsError:\n", " print(\"Folder is Exist\")\n", " pass" ] }, { "cell_type": "markdown", "id": "ca939290", "metadata": {}, "source": [ "Download Files" ] }, { "cell_type": "code", "execution_count": null, "id": "f98a61ed", "metadata": {}, "outputs": [], "source": [ "def download_file():\n", " capital_cost_link = \"https://huggingface.co/datasets/theethawats98/tdce-example-extended-random/resolve/main/generated_capital_cost.csv\"\n", " capital_path = 'datasets/extended-random/generated_capital_cost.csv'\n", " employee_usage_link = \"https://huggingface.co/datasets/theethawats98/tdce-example-extended-random/resolve/main/generated_employee_usage.csv\"\n", " employee_path = 'datasets/extended-random/generated_employee_usage.csv'\n", " material_usage_link = \"https://huggingface.co/datasets/theethawats98/tdce-example-extended-random/resolve/main/generated_material_usage.csv\"\n", " material_path = 'datasets/extended-random/generated_material_usage.csv'\n", " process_data_link = \"https://huggingface.co/datasets/theethawats98/tdce-example-extended-random/resolve/main/generated_process_data.csv\"\n", " process_path = 'datasets/extended-random/generated_process_data.csv'\n", "\n", "\n", " for link, path in [\n", " (capital_cost_link, capital_path),\n", " (employee_usage_link, employee_path),\n", " (material_usage_link, material_path),\n", " (process_data_link, process_path)\n", " ]:\n", " if not os.path.exists(path):\n", " response = requests.get(link)\n", " if response.status_code == 200:\n", " with open(path, 'wb') as file:\n", " file.write(response.content)\n", " print(f'File {path} downloaded successfully')\n", " else:\n", " print(f'Failed to download file {path}')\n", " # Downloading the datasets\n", " print(\"Downloading datasets...\")\n", "\n", "if (not ignore_download_dataset):\n", " download_file()" ] }, { "cell_type": "markdown", "id": "8b97d5b2", "metadata": {}, "source": [ "## Model Setting\n", "Select the correct setting for your model." ] }, { "cell_type": "code", "execution_count": null, "id": "3a387bbe", "metadata": {}, "outputs": [], "source": [ "hour_day_employee_widget = widgets.BoundedIntText(\n", " value=8,\n", " min=1,\n", " max=24,\n", " step=1,\n", " description='Hours per Day for Employee:',\n", " disabled=False\n", ")\n", "\n", "\n", "hour_day_capital_cost_widget = widgets.BoundedIntText(\n", " value=21,\n", " min=1,\n", " max=24,\n", " step=1,\n", " description='Hours per Day for Utility / Capital Cost:',\n", " disabled=False\n", ")\n", "\n", "use_outlier_removal_widget = widgets.Checkbox(\n", " value=False,\n", " description='Enable Outlier Removal',\n", " disabled=False,\n", " indent=False\n", ")\n", "\n", "\n", "\n", "outlier_index_widget = widgets.Dropdown(\n", " options=['1', '1.5', '2'],\n", " value='1.5',\n", " description='Removal Idication Index:',\n", " disabled=False,\n", ")\n", "\n", "use_augmentation_widget = widgets.Checkbox(\n", " value=False,\n", " description='Enable Data Augmentation',\n", " disabled=False,\n", " indent=False\n", ")\n", "\n", "use_early_stopping_widget = widgets.Checkbox(\n", " value=False,\n", " description='Enable Early Stopping',\n", " disabled=False,\n", " indent=False\n", ")\n", "\n", "early_stopping_patience_widget = widgets.BoundedIntText(\n", " value=10,\n", " min=1,\n", " max=100,\n", " step=1,\n", " description='Early Stopping Patience Round:',\n", " disabled=False\n", ")\n", "\n", "element_level_lr_widget = widgets.Dropdown(\n", " options=['0.001','0.05', '0.01','0.1','0.5'],\n", " value='0.01',\n", " description='Element Level Learning Rate:',\n", " disabled=False,\n", ")\n", "\n", "model_level_lr_widget = widgets.Dropdown(\n", " options=['0.0000001','0.00000001','0.000000001'],\n", " value='0.00000001',\n", " description='Model Level Learning Rate:',\n", " disabled=False,\n", ")\n", "epoch_widget = widgets.BoundedIntText(\n", " value=100,\n", " min=10,\n", " max=1000,\n", " step=10,\n", " description='Number of Epochs:',\n", " disabled=False\n", ")\n", "\n", "\n", "display(hour_day_employee_widget)\n", "display(hour_day_capital_cost_widget)\n", "display(use_outlier_removal_widget)\n", "display(outlier_index_widget)\n", "display(use_augmentation_widget)\n", "display(use_early_stopping_widget)\n", "display(early_stopping_patience_widget)\n", "display(element_level_lr_widget)\n", "display(model_level_lr_widget)\n", "display(epoch_widget)" ] }, { "cell_type": "markdown", "id": "204af1e1", "metadata": {}, "source": [ "If you config the value on the widget, please reexecute all of these bottom cells for create a model according to your data.\n", "\n", "\n", "Getting Value from Widget" ] }, { "cell_type": "code", "execution_count": null, "id": "8e7c42b1", "metadata": {}, "outputs": [], "source": [ "hour_day_employee = hour_day_employee_widget.value\n", "hour_day_capital_cost = hour_day_capital_cost_widget.value\n", "use_outlier_removal= use_outlier_removal_widget.value\n", "outlier_index = float(outlier_index_widget.value)\n", "use_augmentation = use_augmentation_widget.value\n", "element_level_lr = float(element_level_lr_widget.value)\n", "model_level_lr = float(model_level_lr_widget.value)\n", "use_early_stopping =use_early_stopping_widget.value\n", "early_stopping_patience = early_stopping_patience_widget.value\n", "epoch_number = epoch_widget.value" ] }, { "cell_type": "markdown", "id": "842aa666", "metadata": {}, "source": [ "Generated Cost Matrix" ] }, { "cell_type": "code", "execution_count": null, "id": "f7ed24d9", "metadata": {}, "outputs": [], "source": [ "folder_path = 'datasets/extended-random'\n", "output_folder_path = f'result/{model_level_lr}'\n", "\n", "cost_generator = cmc.CostMatrixGenerator()\n", "cost_generator.change_data_directory(folder_path)\n", "cost_generator.load_data()\n", "input_variation = diva.display_input_variation_by_directory(folder_path)\n", "input_variation.to_csv(f\"{folder_path}/data_variation.csv\")\n", "\n", "(process_df,employee_usage,material_usage,capital_cost_usage) = cost_generator.get_data()" ] }, { "cell_type": "markdown", "id": "4ea56434", "metadata": {}, "source": [ "If you want to see the raw data, you can display using `process_df`, `process_df.head()`, `employee_df` and so on." ] }, { "cell_type": "markdown", "id": "3bd65eb3", "metadata": {}, "source": [ "## Pre-Processing\n", "djust and filter in input datasets (Material, Employee, Capital Cost) to match the output dataset (Process Dataset) and Depend on Outlier Removal Condition" ] }, { "cell_type": "code", "execution_count": null, "id": "cec6dcf7", "metadata": {}, "outputs": [], "source": [ "if use_outlier_removal:\n", " cost_generator.remove_outlier_iqr(outlier_index)\n", " (\n", " new_process_df,\n", " new_employee_usage,\n", " new_material_usage,\n", " new_capital_cost_usage,\n", " ) = cost_generator.get_data()\n", " (new_capital_cost_usage, new_employee_usage, new_material_usage) = (\n", " ajd.adjust_to_match_process(\n", " capital_cost_usage=new_capital_cost_usage,\n", " employee_usage=new_employee_usage,\n", " material_usage=new_material_usage,\n", " new_process_df=new_process_df,\n", " )\n", " )\n", " new_variation = diva.display_input_variation(\n", " new_process_df,\n", " new_material_usage,\n", " new_employee_usage,\n", " new_capital_cost_usage,\n", " )\n", " new_variation.to_csv(f\"{folder_path}/data_variation_after_outlier.csv\")\n", " new_process_df.to_csv(f\"{folder_path}/process_df_after_outlier.csv\")\n", " try:\n", " print(\"Data Variation After Outlier Removed\")\n", " display(new_variation)\n", " except Exception as e:\n", " print(\"This is not Jupyter Notebook\", e)\n", "else:\n", " display(input_variation)" ] }, { "cell_type": "markdown", "id": "00976358", "metadata": {}, "source": [ "### Data Splitting\n", "Split training and validation dataset" ] }, { "cell_type": "code", "execution_count": null, "id": "d1904264", "metadata": {}, "outputs": [], "source": [ "(\n", " train_process_df,\n", " train_employee_usage,\n", " train_material_usage,\n", " train_capital_cost,\n", " validate_process_df,\n", " validate_employee_usage,\n", " validate_material_usage,\n", " validate_capital_cost,\n", ") = cost_generator.train_test_split_without_matrix(0.7)\n", "\n", "\n", "# Generate Cost Matrix for Validation Set\n", "validation_payload = cost_generator.get_validation_payload(\n", " validate_process_df\n", ")\n", "\n", "\n", "# Display Variation of Train Data\n", "train_variation = diva.display_input_variation(\n", " train_process_df,\n", " train_material_usage,\n", " train_employee_usage,\n", " train_capital_cost,\n", ")\n", "train_variation.to_csv(\n", " f\"{folder_path}/train_data_variation.csv\")\n", "train_process_df.to_csv(\n", " f\"{folder_path}/train_process_df.csv\")\n", "# Display Variation of Validation Data\n", "validate_variation = diva.display_input_variation(\n", " validate_process_df,\n", " validate_material_usage,\n", " validate_employee_usage,\n", " validate_capital_cost,\n", ")\n", "validate_variation.to_csv(\n", " f\"{folder_path}/validate_data_variation.csv\"\n", ")\n", "if use_augmentation:\n", " # TODO: Increase the Generalization of the Model\n", " # Augmented the Imbalance Class of Training Data\n", " train_process_df.to_csv(\n", " f\"{folder_path}/train_process_df_before_augmented.csv\"\n", " )\n", " train_process_df = viya.vy_training_augmentation(\n", " train_process_df)\n", " # Display Variation of Train Data After Augmented\n", " train_variation = diva.display_input_variation(\n", " train_process_df,\n", " train_material_usage,\n", " train_employee_usage,\n", " train_capital_cost,\n", " )\n", " train_variation.to_csv(\n", " f\"{folder_path}/train_data_variation_after_augmented.csv\"\n", " )\n", " train_process_df.to_csv(\n", " f\"{folder_path}/train_process_df_after_augmented_{round}.csv\"\n", " )\n", " \n", "# Generate Matrix From Training Set\n", "(\n", " material_cost_matrix,\n", " material_amount_matrix,\n", " employee_cost_matrix,\n", " employee_duration_matrix,\n", " employee_day_amount_matrix,\n", " capital_cost_matrix,\n", " day_amount_matrix,\n", " capital_cost_duration_matrix, # New On Finetune\n", " result_matrix,\n", ") = cost_generator.generate_data_from_input(\n", " train_process_df,\n", " train_material_usage,\n", " train_employee_usage,\n", " train_capital_cost,\n", ")" ] }, { "cell_type": "markdown", "id": "d1bdcbbc", "metadata": {}, "source": [ "## Initial Model\n", "\n", "### Initial Layer\n", "Create function to initial layer from cost matrix, it will automatically find what the input size is need for due to the data" ] }, { "cell_type": "code", "execution_count": null, "id": "2fce43cf", "metadata": {}, "outputs": [], "source": [ "def inital_layer(\n", " material_cost_matrix,\n", " employee_cost_matrix,\n", " capital_cost_matrix,\n", "):\n", " total_col = 0\n", " # Material FC Layer\n", " row, high, col = material_cost_matrix.shape\n", " material_layer_1 = mfl.MaterialFCLayer(col, 1)\n", " # material_layer_1.annotate(material_cost_matrix, material_amount_matrix)\n", " total_col += col\n", "\n", " # Monthy Employee FC Layer\n", " row, high, col = employee_cost_matrix.shape\n", " employee_layer_1 = efl.EmployeeFCLayer(col, 1, 8)\n", " total_col += col\n", " # monthy_employee_layer_1.annotate(monthy_employee_cost_matrix, duration_matrix)\n", "\n", " # Capital Cost FC Layer\n", " row, high, col = capital_cost_matrix.shape\n", " capital_cost_layer1 = cfl.CapitalCostFCLayer(col, 1, 21)\n", " total_col += col\n", " # capital_cost_layer1.annotate(\n", " # capital_cost_matrix, life_time_matrix, machine_hour_matrix, duration_matrix)\n", "\n", " return (\n", " material_layer_1,\n", " employee_layer_1,\n", " capital_cost_layer1,\n", " )\n" ] }, { "cell_type": "markdown", "id": "662b0bd2", "metadata": {}, "source": [ "### Model Initialization\n", "Create the model object from its class" ] }, { "cell_type": "code", "execution_count": null, "id": "53684466", "metadata": {}, "outputs": [], "source": [ "# Initial Model\n", "tdce_model = tdce.TDCEModel()\n", " \n", "# Create the Layer\n", "(\n", " material_layer_1,\n", " employee_layer_1,\n", " capital_cost_layer1,\n", ") = inital_layer(\n", " capital_cost_matrix=capital_cost_matrix,\n", " employee_cost_matrix=employee_cost_matrix,\n", " material_cost_matrix=material_cost_matrix,\n", ")\n", "\n", "# Add the Layer to the Model\n", "tdce_model.inital_inside_element(\n", " material_layer=material_layer_1,\n", " capital_cost_layer=capital_cost_layer1,\n", " employee_layer=employee_layer_1,\n", ")\n", "\n", "# Install the error calculator\n", "tdce_model.use(loss=loss.mse, loss_prime=loss.mse_prime,\n", " loss_percent=loss.rmspe)\n", "\n", "# Set Early Stopping\n", "if use_early_stopping:\n", " tdce_model.activate_early_stopping()\n", " tdce_model.edit_patience_round(early_stopping_patience)" ] }, { "cell_type": "markdown", "id": "df6d4796", "metadata": {}, "source": [ "Setting the learning rate" ] }, { "cell_type": "code", "execution_count": null, "id": "b75da97b", "metadata": {}, "outputs": [], "source": [ "# Use the same learning rate for all element level\n", "tdce_model.set_learning_rate(\n", " element_level_lr,element_level_lr,element_level_lr\n", ")\n", "\n", "# activate model weight\n", "tdce_model.activete_model_weight()" ] }, { "cell_type": "markdown", "id": "38df7ced", "metadata": {}, "source": [ "## Training\n", "\n", "Fit a model with input and output data" ] }, { "cell_type": "code", "execution_count": null, "id": "2fe52d20", "metadata": {}, "outputs": [], "source": [ "start_time = time.time()\n", "\n", "tdce_model.fit_with_validation(\n", " epoch=epoch_number,\n", " learning_rate=model_level_lr,\n", " material_amount_matrix=material_amount_matrix,\n", " material_cost_matrix=material_cost_matrix,\n", " employee_cost_matrix=employee_cost_matrix,\n", " employee_duration_matrix=employee_duration_matrix,\n", " employee_day_amount_matrix=employee_day_amount_matrix,\n", " result_matrix=result_matrix,\n", " capital_cost_matrix=capital_cost_matrix,\n", " day_amount_matrix=day_amount_matrix,\n", " validation_payload=validation_payload,\n", " capital_cost_duration_matrix=capital_cost_duration_matrix,\n", " )\n", "\n", "end_time = time.time()\n", "time_usage = end_time - start_time\n", "print(f\"Learning Rate: {model_level_lr} / {element_level_lr}\")\n", "print(f\"Time Using {time_usage} Second\")\n" ] }, { "cell_type": "markdown", "id": "39b9aa25", "metadata": {}, "source": [ "Get the Error Listing" ] }, { "cell_type": "code", "execution_count": null, "id": "7fa00c92", "metadata": {}, "outputs": [], "source": [ "try:\n", " os.mkdir(f\"{output_folder_path}\")\n", "except FileExistsError:\n", " print(\"Folder is Exist\")\n", " pass" ] }, { "cell_type": "code", "execution_count": null, "id": "8861bedb", "metadata": {}, "outputs": [], "source": [ "error_list = tdce_model.get_epoch_error()\n", "sample_error_list = tdce_model.get_sample_error()\n", "error_df = pd.DataFrame(error_list)\n", "sample_error_df = pd.DataFrame(sample_error_list)\n", "\n", "\n", "# Get Overall Output\n", "minimum_error = error_df[\"error\"].min()\n", "minimum_percent_error = error_df[\"error_percent\"].min()\n", "minimum_validate_error = error_df[\"validate_error\"].min()\n", "minimum_validate_percent_error = error_df[\"validate_error_percent\"].min()\n", "\n", "# Export the Output Result\n", "error_df.to_csv(f\"{output_folder_path}/{epoch_number}-{element_level_lr}.csv\")\n", "sample_error_df.to_csv(\n", " f\"{output_folder_path}/error-list-{epoch_number}-{element_level_lr}.csv\"\n", ")\n", "sample_payload = tdce_model.get_sample_payload()\n", "sample_payload_df = pd.DataFrame(sample_payload)\n", "\n", "# Export all the sample / history of all training\n", "sample_payload_df.to_csv(\n", " f\"{output_folder_path}/sample-payload-list-{epoch_number}-{element_level_lr}.csv\",\n", " index=False,\n", ")\n", "\n", "print(f\"Minimum Error: {minimum_error}\"\n", " f\" / Minimum Percent Error (RMSPE): {minimum_percent_error}\"\n", " f\" / Minimum Validate Error: {minimum_validate_error}\"\n", " f\" / Minimum Validate Percent Error (RMSEP): {minimum_validate_percent_error}\")\n" ] }, { "cell_type": "markdown", "id": "639b9b91", "metadata": {}, "source": [ "## Visualization\n", "\n", "### Error Behavior\n", "Display the model training behavior" ] }, { "cell_type": "code", "execution_count": null, "id": "dc3bdc9d", "metadata": {}, "outputs": [], "source": [ "mp.plotting_learning_curve(epoch_error=error_df,element_learning_rate=element_level_lr,\n", " model_learning_rate=model_level_lr)" ] }, { "cell_type": "markdown", "id": "0c101da0", "metadata": {}, "source": [ "### Weight Adjustment Behavior\n", "Display the Weight of Each Model Element" ] }, { "cell_type": "code", "execution_count": null, "id": "bbd09ca5", "metadata": {}, "outputs": [], "source": [ "importlib.reload(mp)\n", "mp.plot_model_level_weight(adjustment_data=sample_payload_df,epoch_error=error_df)" ] }, { "cell_type": "markdown", "id": "3a99ea9c", "metadata": {}, "source": [ "Display the weight of each material, labor, and utility cost object." ] }, { "cell_type": "code", "execution_count": null, "id": "283c58fe", "metadata": {}, "outputs": [], "source": [ "material_columns = [col for col in sample_payload_df.columns if col.startswith('material_weight_')]\n", "employee_columns = [col for col in sample_payload_df.columns if col.startswith('employee_weight_')]\n", "capital_columns = [col for col in sample_payload_df.columns if col.startswith('capital_cost_weight_')]\n", "\n", "\n", "importlib.reload(mp)\n", "mp.plot_element_level_weight(\n", " adjustment_data=sample_payload_df,\n", " material_columns=material_columns,labor_columns= employee_columns,\n", " utility_columns= capital_columns)" ] }, { "cell_type": "markdown", "id": "d6cc2707", "metadata": {}, "source": [ "## Export Model\n", "Export Model to keep and use in another place" ] }, { "cell_type": "code", "execution_count": null, "id": "74b2f7e2", "metadata": {}, "outputs": [], "source": [ "tdce_model.export_model(\"tdce_model.pkl\")" ] }, { "cell_type": "markdown", "id": "dce151ec", "metadata": {}, "source": [ "© 2025, Intelligent Automation Engineering Center, Prince of Songkla University" ] } ], "metadata": { "kernelspec": { "display_name": "venv", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.2" } }, "nbformat": 4, "nbformat_minor": 5 }