AI-OMS-Analyze

Sleeping

App Files Files Community

kawaiipeace commited on Sep 26

Commit

cc2e1db

1 Parent(s): d4d1ca8

Update Function

Browse files

Files changed (12) hide show

.DS_Store +0 -0
.dockerignore +14 -0
.gitignore +8 -2
Dockerfile +42 -0
README.md +81 -156
app.py +237 -35
requirements.txt +1 -0
scripts/classify.py +4 -259
scripts/compute_reliability.py +242 -0
scripts/forecast.py +1384 -12
scripts/{summarize.py → recommendation.py} +0 -0
scripts/summary.py +159 -0

.DS_Store CHANGED Viewed

Binary files a/.DS_Store and b/.DS_Store differ

.dockerignore ADDED Viewed

	@@ -0,0 +1,14 @@

+__pycache__/
+venv/
+.venv/
+.git/
+.gitignore
+outputs/
+data/
+.env
+*.pyc
+*.pkl
+*.joblib
+*.csv
+__MACOSX
+*.log

.gitignore CHANGED Viewed

@@ -1,7 +1,13 @@
 .env
-outputs/
 __pycache__/
 scripts/__pycache__
 .venv/
 .python-version
-README.md

+README.md
 .env
+data/*
+outputs/*
 __pycache__/
 scripts/__pycache__
 .venv/
 .python-version
+.ipynb_checkpoints
+.DS_Store
+.vscode/
+.idea/
+*.log

Dockerfile ADDED Viewed

	@@ -0,0 +1,42 @@

+# Start from official slim Python 3.12 image
+FROM python:3.12.9-slim
+# Set working directory
+WORKDIR /app
+# Install system dependencies required by some packages (geopandas, prophet/cmdstanpy, tensorflow)
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    build-essential \
+    git \
+    curl \
+    gcc \
+    g++ \
+    libgeos-dev \
+    libproj-dev \
+    proj-data \
+    proj-bin \
+    libgdal-dev \
+    pkg-config \
+    wget \
+    unzip \
+    ca-certificates \
+    && rm -rf /var/lib/apt/lists/*
+# Ensure pip is up to date
+RUN python -m pip install --upgrade pip setuptools wheel
+# Copy requirements and install Python dependencies
+COPY requirements.txt /app/requirements.txt
+RUN pip install --no-cache-dir -r /app/requirements.txt
+# Copy application source
+COPY . /app
+# Create outputs directory
+RUN mkdir -p /app/outputs /app/data
+# Expose ports commonly used by Gradio and Uvicorn
+EXPOSE 7860 8000
+# Default command: run the app with python (Gradio will launch)
+CMD ["python", "app.py"]

README.md CHANGED Viewed

@@ -1,190 +1,115 @@
 # OMS Analyze — Prototype
-โปรเจกต์นี้เป็นแอปพลิเคชันต้นแบบสำหรับวิเคราะห์ข้อมูลการดับไฟฟ้า (OMS - Outage Management System) โดยใช้ AI และ Machine Learning เพื่อสรุป สืบหาความผิดปกติ พยากรณ์ และจำแนกสาเหตุ
-แอปสร้างด้วย Gradio สำหรับใช้งานผ่านเว็บเบราว์เซอร์ และรองรับการอัปโหลดไฟล์ CSV เพื่อวิเคราะห์
-## การติดตั้งและรัน
-### ข้อกำหนด
-- Python 3.12.9
-- ติดตั้งแพ็คเกจจาก `requirements.txt`:
-  ```
-  pip install -r requirements.txt
-  ```
-### การตั้งค่าสภาพแวดล้อมด้วย pyenv (แนะนำ)
-ตัวอย่างคำสั่งที่ผมใช้เพื่อเตรียม environment บน macOS (แนะนำใช้ pyenv เพื่อจัดการเวอร์ชัน Python):
 ```
-cd /Users/peace/Documents/projects/huggingface/AI-OMS-Analyze
 pyenv install 3.12.9
 pyenv local 3.12.9
 /Users/$(whoami)/.pyenv/versions/3.12.9/bin/python -m venv .venv
 source .venv/bin/activate
 python -m pip install --upgrade pip setuptools wheel
 python -m pip install -r requirements.txt
 ```
-หมายเหตุ:
-- ผมคอมเมนต์ `prophet` ใน `requirements.txt` เพราะการติดตั้งจะพยายามคอมไพล์ CmdStan (C++ build) ซึ่งใช้เวลานานและมักต้องการเครื่องมือเพิ่มเติม (Xcode command line tools). ถ้าต้องการติดตั้ง `prophet` ให้ uncomment แล้วเตรียมเครื่องมือ build.
-- ผมอัปเดต `fsspec` และ `openai` เป็นเวอร์ชันที่มีอยู่ใน PyPI เพื่อให้ pip หาแพ็กเกจเจอได้
-- สำหรับการใช้ Hugging Face Router (สำหรับ LLM ใน Summarization และ Classification): ตั้งค่า `HF_TOKEN` ในไฟล์ `.env` (ดู `.env.example`)
-### Windows (ใช้ pyenv-win)
-ถ้าคุณใช้ Windows ผมแนะนำติดตั้ง `pyenv-win` เพื่อจัดการเวอร์ชัน Python (หรือใช้ Python ที่ติดตั้งผ่าน Chocolatey / Microsoft Store แล้วสร้าง venv ตามปกติ)
-1) ติดตั้ง pyenv-win (PowerShell - Run as Administrator แนะนำ)
-```
-# ผ่าน PowerShell (Admin)
-Invoke-WebRequest -UseBasicParsing -Uri "https://pyenv.run" -OutFile "pyenv-win-install.ps1"
-Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser -Force
-.\n+\pyenv-win-install.ps1
-# หรือ ติดตั้งด้วย Scoop (ถ้ามี scoop)
-scoop install pyenv
-แอปแบ่งเป็นแท็บหลัก (UI) ดังนี้ — คำอธิบายสรุปการใช้งานและไฟล์ผลลัพธ์ที่ระบบจะสร้างในโฟลเดอร์ `outputs/`:
-### 1. Upload & Preview
-- วัตถุประสงค์: อัปโหลดไฟล์ CSV และตรวจสอบตัวอย่างข้อมูล
-- วิธีใช้:
-  1. คลิก "Upload CSV" แล้วเลือกไฟล์ (ตัวอย่าง: `data/data_1.csv`, `data/data_3.csv`)
-  2. ดูตัวอย่าง 10 แถวแรกและตรวจสอบชนิดของคอลัมน์
-- ผลลัพธ์: preview เท่านั้น — การรันฟังก์ชันวิเคราะห์จะสร้างผลลัพธ์ลงใน `outputs/`
-### 2. Summarization
-- วัตถุประสงค์: สร้างสรุปภาษาธรรมชาติสำหรับแต่ละเหตุการณ์หรือชุดข้อมูล
-- วิธีใช้:
-  1. อัปโหลดไฟล์ CSV
-  2. ระบุแถวที่ต้องการสรุป (เว้นว่าง = ทั้งหมด; หรือระบุ index เช่น `0,1,2`)
-  3. ถ้าต้องการใช้ LLM ให้เปิด "Use Hugging Face Router" และตั้ง `HF_TOKEN` ใน `.env`
-  4. เลือกระดับรายละเอียด (brief / medium / detailed) แล้วคลิก "Generate Summaries"
-- ผลลัพธ์: ไฟล์ `outputs/summaries_from_ui.csv` (UI run) หรือ `outputs/summaries.csv` (batch export)
-### 3. Anomaly Detection
-- วัตถุประสงค์: ตรวจหาเหตุการณ์ผิดปกติด้วย ML (IsolationForest, LOF)
-- วิธีใช้:
-  1. อัปโหลดไฟล์ CSV
-  2. เลือก algorithm: `both` (IsolationForest + LOF), `iso`, หรือ `lof`
-  3. ปรับค่า contamination (0.01–0.2, default 0.05)
-  4. คลิก "Run Anomaly Detection"
-- ผลลัพธ์: `outputs/anomalies_from_ui.csv` (UI run) หรือ `outputs/anomalies.csv` (batch). รายการ suspects จะถูกบันทึกเป็น `outputs/ntl_suspects.csv` ถ้ามีการตรวจจับ
-### 4. Forecasting
-- วัตถุประสงค์: พยากรณ์จำนวนเหตุการณ์หรือ downtime ในอนาคต
-- วิธีใช้:
-  1. อัปโหลดไฟล์ CSV
-  2. เลือก metric: `count` หรือ `downtime_minutes`
-  3. ตั้ง horizon (7–90 วัน, default 14)
-  4. คลิก "Run Forecast"
-- ผลลัพธ์: `outputs/forecast_count_from_ui.csv`, `outputs/forecast_downtime_minutes_from_ui.csv` (UI run) หรือ batch outputs `outputs/forecast_count.csv`, `outputs/forecast_downtime_minutes.csv`
-  (หากต้องการความแม่นยำมากขึ้น ให้ติดตั้ง `prophet==1.1.7`)
-### 5. Classification
-- วัตถุประสงค์: จำแนกสาเหตุของเหตุการณ์ (root-cause classification)
-- วิธีใช้:
-  1. อัปโหลดไฟล์ CSV
-  2. (Optional) เปิด "Run weak-labeling using HF" เพื่อให้ LLM สร้าง weak labels (`HF_TOKEN` ต้องถูกตั้ง)
-  3. (Optional) เลือก "Run GridSearch" สำหรับ tuning
-  4. คลิก "Train Classifier"
-- ผลลัพธ์: โมเดลและผลลัพธ์จะถูกบันทึกลง `outputs/`, ตัวอย่างไฟล์:
-  - `outputs/rf_cause_pipeline.joblib`
-  - `outputs/predictions_cause.csv`
-  - ถ้ามีหลายโมเดล: `outputs/predictions_gb_CauseType.csv`, `outputs/predictions_mlp_CauseType.csv`, `outputs/predictions_rf_CauseType.csv`
-python -m pip install --upgrade pip setuptools wheel
-python -m pip install -r requirements.txt
-```
-หมายเหตุ สำหรับ Windows:
-- บางแพ็กเกจ (เช่น `geopandas`, `fiona`, `pyproj`, `shapely`) มี native dependencies (GDAL, PROJ) — การใช้ Miniforge/Conda จะทำให้ง่ายกว่า (conda-forge)
-- `prophet` มักจะต้องการการติดตั้ง CmdStan/cmdstanpy ล่วงหน้า และอาจจะซับซ้อนบน Windows — พิจารณาใช้ WSL2 หรือ conda สำหรับงานนี้
-### รันแอป
-```
 python app.py
 ```
-แอปจะรันที่ `http://127.0.0.1:7860` (หรือปรับ port ด้วย `--server.port PORT`)
-### โครงสร้างไฟล์
-- `data/data.csv`: ไฟล์ข้อมูลตัวอย่าง
-- `scripts/`: โมดูล Python สำหรับแต่ละฟีเจอร์
-- `outputs/`: ไฟล์ผลลัพธ์จากการวิเคราะห์
-- `app.py`: แอป Gradio หลัก
-## คู่มือการใช้งาน
-แอปแบ่งเป็นแท็บต่างๆ ดังนี้:
 ### 1. Upload & Preview
-- **วัตถุประสงค์**: อัปโหลดไฟล์ CSV และดูตัวอย่างข้อมูล
 - **วิธีใช้**:
-  1. คลิก "Upload CSV" เลือกไฟล์ CSV (เช่น `data/data.csv`)
-  2. แสดงตัวอย่าง 10 แถวแรกในตาราง HTML
-- **หมายเหตุ**: ใช้สำหรับตรวจสอบข้อมูลก่อนวิเคราะห์
-### 2. Summarization
-- **วัตถุประสงค์**: สร้างสรุปภาษาธรรมชาติสำหรับเหตุการณ์ดับไฟฟ้า
 - **วิธีใช้**:
-  1. อัปโหลดไฟล์ CSV
-  2. เลือกแถวที่ต้องการสรุป (เว้นว่าง = ทั้งหมด, หรือใส่ index เช่น `0,1,2`)
-  3. เลือก "Use Hugging Face Router" ถ้ามี `HF_TOKEN` (ใช้ LLM สำหรับสรุปละเอียด)
-  4. เลือก verbosity: brief/medium/detailed
-  5. คลิก "Generate Summaries"
-  6. ดูผลสรุปในตาราง และ download ไฟล์ `summaries_from_ui.csv`
-- **ฟีเจอร์**: รองรับ fallback เป็น rule-based summary ถ้าไม่มี HF token
-- **ผลลัพธ์**: CSV กับคอลัมน์ EventNumber, OutageDateTime, Summary
-### 3. Anomaly Detection
-- **วัตถุประสงค์**: ตรวจหาเหตุการณ์ผิดปกติโดยใช้ Machine Learning
 - **วิธีใช้**:
-  1. อัปโหลดไฟล์ CSV
-  2. เลือก algorithm: both (IsolationForest + LOF), iso, หรือ lof
-  3. ปรับ contamination (0.01-0.2, ค่าเริ่มต้น 0.05)
-  4. คลิก "Run Anomaly Detection"
-  5. ดูผลในตา���าง (รวมคอลัมน์ anomaly flags และ explanations)
-  6. Download ไฟล์ `anomalies_from_ui.csv`
-- **ฟีเจอร์**: Feature engineering (duration, load, affected customers, etc.), z-score explanations
-- **ผลลัพธ์**: CSV กับคอลัมน์ anomaly scores และ textual explanations
-### 4. Forecasting
-- **วัตถุประสงค์**: พยากรณ์จำนวนเหตุการณ์หรือ downtime ในอนาคต
 - **วิธีใช้**:
-  1. อัปโหลดไฟล์ CSV
-  2. เลือก metric: count (จำนวนเหตุการณ์) หรือ downtime_minutes
-  3. ปรับ horizon (7-90 วัน, ค่าเริ่มต้น 14)
-  4. คลิก "Run Forecast"
-  5. ดูผลในตาราง (แสดง actual ล่าสุด + forecast)
-  6. Download ไฟล์ `forecast_{metric}_from_ui.csv`
-- **ฟีเจอร์**: ใช้ Prophet ถ้าติดตั้งได้, มิฉะนั้นใช้ naive forecast
-- **ผลลัพธ์**: CSV กับคอลัมน์ date และ predicted values
 ### 5. Classification
-- **วัตถุประสงค์**: จัดประเภทสาเหตุของเหตุการณ์ (Root-Cause Classification)
 - **วิธีใช้**:
-  1. อัปโหลดไฟล์ CSV
-  2. เลือก "Run weak-labeling using HF" ถ้าต้องการใช้ LLM เสริม labels (ต้องมี HF_TOKEN)
-  3. เลือก "Run GridSearch" สำหรับ hyperparameter tuning (ช้ากว่า)
-  4. คลิก "Train Classifier"
-  5. ดู classification report ใน textbox
-  6. Download โมเดล `rf_cause_pipeline.joblib` และ predictions `predictions_cause.csv`
-- **ฟีเจอร์**: Feature engineering, RandomForest pipeline, optional GridSearchCV, weak-labeling ด้วย HF Router
-- **ผลลัพธ์**: โมเดล trained และ predictions CSV
 ## หมายเหตุ
-- เป็น prototype: ฟีเจอร์อาจไม่สมบูรณ์หรือปรับปรุงได้
-- สำหรับ deployment บน Hugging Face Spaces: อัปโหลดโค้ดและตั้งค่า HF_TOKEN ใน Secrets
-- ข้อมูลตัวอย่าง: `data/data.csv` มีคอลัมน์เช่น EventNumber, OutageDateTime, CauseType, etc.
-- ถ้ามีปัญหา: ตรวจสอบ console สำหรับ errors และ ensure Python environment ถูกต้อง
 ## การพัฒนาเพิ่มเติม
-- เพิ่ม interactive maps สำหรับ anomalies
-- SHAP explanations สำหรับ models
-- Human-in-the-loop สำหรับ weak labels
-- Alerting และ real-time processing

 # OMS Analyze — Prototype
+> Created by PEACE, Powered by AI, Version 0.0.1
+Prototype Application Platform สำหรับวิเคราะห์ข้อมูลการดับไฟฟ้า (OMS - Outage Management System) โดยใช้ AI และ Machine Learning เพื่อสรุป สืบหาความผิดปกติ พยากรณ์ และจำแนกสาเหตุ
+แอปสร้างด้วย Gradio สำหรับใช้งานผ่านเว็บเบราว์เซอร์ และรองรับการอัปโหลดไฟล์ CSV เพื่อวิเคราะห์ รองรับบน Huggingface Space
+## วิธีการติดตั้งและใช้งาน
+### วิธีการใช้งานผ่าน Docker (แนะนำ)
+ต้องมี Docker ก่อน ถึงจะสามารถใช้งานได้ [(ดาวน์โหลด Rancher Desktop)](https://github.com/rancher-sandbox/rancher-desktop/releases/download/v1.20.0/Rancher.Desktop.Setup.1.20.0.msi)
+```
+docker build -t ai-oms-analyze:latest .
+docker run -d --rm -p 7860:7860 -p 8000:8000 --env-file .env -v $(pwd)/outputs:/app/outputs ai-oms-analyze:latest
+```
+### วิธีการใช้งาน (MacOS)
 ```
+brew install pyenv
+cd /AI-OMS-Analyze
 pyenv install 3.12.9
 pyenv local 3.12.9
 /Users/$(whoami)/.pyenv/versions/3.12.9/bin/python -m venv .venv
 source .venv/bin/activate
 python -m pip install --upgrade pip setuptools wheel
 python -m pip install -r requirements.txt
+python app.py
 ```
+### วิธีการใช้งาน (Windows)
+```bash
+Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))
+choco install pyenv-win
+Set-ExecutionPolicy -Scope CurrentUser -ExecutionPolicy Unrestricted
+cd /AI-OMS-Analyze
+pyenv install 3.12.9
+pyenv local 3.12.9
+pip install -r requirements.txt
 python app.py
 ```
+## เมนูการใช้งาน
+แอปแบ่งเป็นแท็บต่าง ๆ ดังนี้ (ตรงกับ UI ใน `app.py`):
 ### 1. Upload & Preview
+- **Usecase Scenario**: อัปโหลดไฟล์ CSV เพื่อตรวจสอบข้อมูลต้นฉบับ และทำความสะอาดข้อมูล (ลบข้อมูลซ้ำ, จัดการค่าที่หายไป)
 - **วิธีใช้**:
+  1. คลิก "Upload CSV (data.csv)" และเลือกไฟล์
+  2. ปรับตัวเลือกเช่น Remove Duplicates และ Missing Values Handling
+  3. คลิก "Apply Cleansing" เพื่อรันการทำความสะอาด
+  4. เปรียบเทียบตัวอย่างข้อมูลในแท็บ "Original Data" และ "Cleansed Data"
+  5. ดาวน์โหลดไฟล์ผลลัพธ์จากปุ่ม "Download Cleansed CSV"
+- **ผลลัพธ์**: ไฟล์ `outputs/cleansed_data.csv` (ดาวน์โหลดผ่าน UI)
+### 2. Recommendation
+- **Usecase Scenario**: สร้างสรุปข้อความสำหรับเหตุการณ์ที่เลือก (เช่น สรุปเหตุการณ์ไฟฟ้าขัดข้องหรือบำรุงรักษา) และส่งออก CSV ของสรุป
 - **วิธีใช้**:
+  1. คลิก "Upload CSV (data.csv)"
+  2. กรอกแถวที่ต้องการในช่อง "Rows (comma-separated indexes)" หรือเว้นว่างเพื่อย่อให้เป็นทั้งหมด
+  3. เลือกว่าต้องการใช้ Generative AI (Use Generative AI) หรือไม่
+  4. เลือกระดับสรุป (Summary Type) แล้วคลิก "Generate Summaries"
+  5. ดูผลในตาราง และดาวน์โหลด `outputs/summaries_from_ui.csv`
+- **ฟีเจอร์**: รองรับการใช้ GenAI (model selector จะปรากฏเมื่อเปิด Use Generative AI)
+### 3. Summary
+- **Usecase Scenario**: สร้างสรุปภาพรวมของชุดข้อมูลทั้งชุด รวมสถิติพื้นฐาน และคำนวณดัชนีความน่าเชื่อถือ (SAIFI, SAIDI, CAIDI)
 - **วิธีใช้**:
+  1. คลิก "Upload CSV for Overall Summary"
+  2. เลือกว่าจะใช้ Generative AI ในการขยายความหรือไม่
+  3. กำหนดจำนวนลูกค้าทั้งหมดสำหรับการคำนวณ reliability
+  4. คลิก "Generate Overall Summary" เพื่อรับ AI summary, basic statistics และ reliability metrics
+### 4. Anomaly Detection
+- **Usecase Scenario**: ตรวจจับเหตุการณ์ที่ผิดปกติโดยใช้หลาย algorithm (Isolation Forest, LOF, Autoencoder)
 - **วิธีใช้**:
+  1. คลิก "Upload CSV for Anomaly"
+  2. เลือก algorithm และปรับค่า contamination
+  3. คลิก "Run Anomaly Detection"
+  4. ดูผลลัพธ์ในตารางและดาวน์โหลด `outputs/anomalies_from_ui.csv`
 ### 5. Classification
+- **Usecase Scenario**: ฝึกโมเดลเพื่อจำแนกสาเหตุของเหตุการณ์ (เลือก Target Column เช่น CauseType หรือ SubCauseType)
 - **วิธีใช้**:
+  1. คลิก "Upload CSV for Classification"
+  2. เลือก Target Column และชนิดโมเดล (rf/gb/mlp)
+  3. ปรับ Hyperparameters ใน Accordion (ถ้าจำเป็น) หรือเปิด Weak-labeling เพื่อเรียกใช้ HF
+  4. คลิก "Train Classifier" แล้วรอรายงานผล
+  5. ดาวน์โหลดโมเดลและไฟล์ predictions ผ่านปุ่มที่ปรากฏ
+### 6. Label Suggestion
+- **Usecase Scenario**: แนะนำป้ายกำกับสำหรับเหตุการณ์ที่ไม่มีฉลาก โดยยึดจากความคล้ายกับตัวอย่างที่มีฉลาก
+- **วิธีใช้**:
+  1. คลิก "Upload CSV (defaults to data/data_3.csv)" หรือปล่อยให้ใช้ไฟล์เริ่มต้น
+  2. เลือกจำนวนคำแนะนำสูงสุด (Top K suggestions)
+  3. คลิก "Run Label Suggestion" แล้วดาวน์โหลด `outputs/label_suggestions.csv`
+### 7. Forecasting
+- **Usecase Scenario**: พยากรณ์จำนวนเหตุการณ์หรือ downtime ในอนาคตโดยเลือกโมเดล (Prophet, LSTM, Bi-LSTM, GRU, Naive)
+- **วิธีใช้**:
+  1. คลิก "Upload CSV for Forecasting"
+  2. เลือก metric (count หรือ downtime_minutes) และ model
+  3. ปรับ periods/horizon และ (ถ้าจำเป็น) เปิด Multivariate สำหรับ DL models
+  4. คลิก "Run Forecasting" เพื่อดู Historical Data, Forecast Results และ Time Series Plot
+  5. ดาว��์โหลดไฟล์ forecast ที่สร้างใน `outputs/` (ชื่อไฟล์รูปแบบ `forecast_{metric}_{model}_...csv`)
 ## หมายเหตุ
+- เป็น prototype ยังไม่สามารถใช้งานบนระดับ Production ได้
+- แนะนำแหล่งอ่าน Machine Learning [ที่นี่](https://guopai.github.io/)
 ## การพัฒนาเพิ่มเติม
+- TBA

app.py CHANGED Viewed

@@ -1,7 +1,7 @@
 import gradio as gr
 import pandas as pd
 from pathlib import Path
-from scripts.summarize import summarize_events
 from scripts.data_cleansing import cleanse_data
 from dotenv import load_dotenv
 import os
@@ -36,7 +36,7 @@ with gr.Blocks() as demo:
     with gr.Tabs():
         # Upload & Preview tab
         with gr.TabItem('Upload & Preview'):
-            gr.Markdown("**อัปโหลด & แสดงตัวอย่าง**: อัปโหลดไฟล์ CSV เพื่อแสดงตัวอย่างข้อมูล ใช้ algorithm ทำความสะอาดข้อมูล (ลบข้อมูลซ้ำ, จัดการค่าที่หายไป) และเปรียบเทียบข้อมูลเดิมกับข้อมูลที่ทำความสะอาด ดาวน์โหลด CSV ที่ทำความสะอาดแล้ว")
             csv_up = gr.File(label='Upload CSV (data.csv)')
             with gr.Row():
                 remove_dup = gr.Checkbox(label='Remove Duplicates', value=False)
@@ -74,9 +74,9 @@ with gr.Blocks() as demo:
             csv_up.change(fn=initial_preview, inputs=csv_up, outputs=[original_preview, cleansed_preview, clean_status])
             apply_clean.click(fn=apply_cleansing, inputs=[csv_up, remove_dup, missing_handling], outputs=[cleansed_preview, clean_status, download_cleansed])
-        # Summarization tab
-        with gr.TabItem('Summarization'):
-            gr.Markdown("**สรุปเหตุการณ์**: สร้างสรุปข้อความสำหรับเหตุการณ์ไฟฟ้าล้ม เลือกแถวเฉพาะ, ระดับรายละเอียด และเลือกใช้ AI (Hugging Face) เพื่อเพิ่มความละเอียด ดาวน์โหลด CSV สรุป")
             csv_in = gr.File(label='Upload CSV (data.csv)')
             with gr.Row():
                 rows = gr.Textbox(label='Rows (comma-separated indexes) or empty = all', placeholder='e.g. 0,1,2')
@@ -121,8 +121,75 @@ with gr.Blocks() as demo:
             use_hf.change(fn=update_model_visibility, inputs=use_hf, outputs=model_selector)
             run_btn.click(fn=run_summarize, inputs=[csv_in, rows, use_hf, verbosity], outputs=[out, status, download])
         with gr.TabItem('Anomaly Detection'):
-            gr.Markdown("**ตรวจจับความผิดปกติ**: ตรวจจับเหตุการณ์ไฟฟ้าล้มที่ผิดปกติโดยใช้ algorithm การเรียนรู้ของเครื่อง (Isolation Forest, Local Outlier Factor, Autoencoder) ตั้งระดับการปนเปื้อนและดาวน์โหลดผลลัพธ์พร้อมธงความผิดปกติ")
             csv_in_anom = gr.File(label='Upload CSV for Anomaly')
             with gr.Row():
                 alg = gr.Radio(choices=['both','iso','lof','autoencoder'], value='both', label='Algorithm')
@@ -146,36 +213,9 @@ with gr.Blocks() as demo:
             run_anom.click(fn=run_anomaly_ui, inputs=[csv_in_anom, alg, contamination], outputs=[anom_out, anom_status, anom_download])
-        # Forecasting tab
-        with gr.TabItem('Forecasting'):
-            gr.Markdown("**พยากรณ์**: พยากรณ์จำนวนเหตุการณ์หรือเวลาหยุดทำงานในอนาคตโดยใช้การวิเคราะห์อนุกรมเวลา (Prophet) เลือกเมตริกและช่วงพยากรณ์ ดาวน์โหลด CSV พยากรณ์")
-            csv_in_fc = gr.File(label='Upload CSV for Forecast')
-            with gr.Row():
-                metric_fc = gr.Radio(choices=['count','downtime_minutes'], value='count', label='Metric')
-                horizon = gr.Slider(minimum=7, maximum=90, value=14, step=1, label='Horizon (days)')
-                run_fc = gr.Button('Run Forecast')
-            fc_out = gr.Dataframe()
-            fc_status = gr.Textbox(label='Forecast Status', interactive=False)
-            fc_download = gr.File(label='Download forecast CSV')
-            def run_forecast_ui(file, metric, horizon_days):
-                if file is None:
-                    return pd.DataFrame(), 'No file provided', None
-                from scripts.forecast import prepare_timeseries, run_forecast
-                df = pd.read_csv(file.name, dtype=str)
-                ts, fcst = run_forecast(df, metric=metric, periods=int(horizon_days))
-                out_file = ROOT / 'outputs' / f'forecast_{metric}_from_ui.csv'
-                out_file.parent.mkdir(exist_ok=True)
-                fcst.to_csv(out_file, index=False, encoding='utf-8-sig')
-                status = f"Forecast produced: {len(fcst)} rows (horizon {horizon_days} days)."
-                display_df = pd.concat([ts.tail(30).rename(columns={'y':'actual'}).set_index('ds'), fcst.set_index('ds')], axis=1).reset_index()
-                return display_df, status, str(out_file)
-            run_fc.click(fn=run_forecast_ui, inputs=[csv_in_fc, metric_fc, horizon], outputs=[fc_out, fc_status, fc_download])
         # Classification tab
         with gr.TabItem('Classification'):
-            gr.Markdown("**จำแนกประเภท**: ฝึกโมเดลการเรียนรู้ของเครื่องเพื่อจำแนกสาเหตุของไฟฟ้าล้ม เลือกประเภทโมเดล (Random Forest, Gradient Boosting, MLP), เปิดใช้งาน weak labeling หรือ grid search ดาวน์โหลดโมเดลที่ฝึกแล้วและการทำนาย")
             csv_in_cls = gr.File(label='Upload CSV for Classification')
             with gr.Row():
                 label_col = gr.Dropdown(choices=['CauseType','SubCauseType'], value='CauseType', label='Target Column')
@@ -294,7 +334,7 @@ with gr.Blocks() as demo:
         # Label Suggestion tab
         with gr.TabItem('Label Suggestion'):
-            gr.Markdown("**แนะนำป้ายกำกับ**: แนะนำป้ายกำกับสาเหตุที่เป็นไปได้สำหรับเหตุการณ์ไฟฟ้าล้มที่ไม่รู้สาเหตุ โดยอิงจากความคล้ายกับสาเหตุที่รู้จัก ตั้งจำนวนคำแนะนำสูงสุด ดาวน์โหลด CSV คำแนะนำ")
             csv_in_ls = gr.File(label='Upload CSV (defaults to data/data_3.csv)')
             with gr.Row():
                 top_k = gr.Slider(minimum=1, maximum=5, value=1, step=1, label='Top K suggestions')
@@ -321,5 +361,167 @@ with gr.Blocks() as demo:
             run_ls.click(fn=run_label_suggestion, inputs=[csv_in_ls, top_k], outputs=[ls_out, ls_status, ls_download])
 if __name__ == '__main__':
     demo.launch()

 import gradio as gr
 import pandas as pd
 from pathlib import Path
+from scripts.recommendation import summarize_events
 from scripts.data_cleansing import cleanse_data
 from dotenv import load_dotenv
 import os
     with gr.Tabs():
         # Upload & Preview tab
         with gr.TabItem('Upload & Preview'):
+            gr.Markdown("**Usecase Scenario — Upload & Preview**: อัปโหลดไฟล์ CSV เพื่อตรวจสอบข้อมูลต้นฉบับ ทำความสะอาดข้อมูล (ลบข้อมูลซ้ำ, จัดการค่าที่หายไป) เปรียบเทียบตัวอย่างก่อน/หลัง และดาวน์โหลดไฟล์ที่ทำความสะอาดแล้ว")
             csv_up = gr.File(label='Upload CSV (data.csv)')
             with gr.Row():
                 remove_dup = gr.Checkbox(label='Remove Duplicates', value=False)
             csv_up.change(fn=initial_preview, inputs=csv_up, outputs=[original_preview, cleansed_preview, clean_status])
             apply_clean.click(fn=apply_cleansing, inputs=[csv_up, remove_dup, missing_handling], outputs=[cleansed_preview, clean_status, download_cleansed])
+        # Recommendation tab
+        with gr.TabItem('Recommendation'):
+            gr.Markdown("**Usecase Scenario — Recommendation**: สร้างสรุปเหตุการณ์ (เช่น สรุปเหตุการณ์ไฟฟ้าล้ม) สำหรับแถวที่เลือก ปรับระดับรายละเอียด และเลือกใช้ Generative AI เพื่อเพิ่มความชัดเจน 및 ดาวน์โหลดไฟล์สรุป")
             csv_in = gr.File(label='Upload CSV (data.csv)')
             with gr.Row():
                 rows = gr.Textbox(label='Rows (comma-separated indexes) or empty = all', placeholder='e.g. 0,1,2')
             use_hf.change(fn=update_model_visibility, inputs=use_hf, outputs=model_selector)
             run_btn.click(fn=run_summarize, inputs=[csv_in, rows, use_hf, verbosity], outputs=[out, status, download])
+        # Summary tab
+        with gr.TabItem('Summary'):
+            gr.Markdown("**Usecase Scenario — Summary**: สร้างสรุปภาพรวมของชุดข้อมูลทั้งหมด รวมสถิติพื้นฐาน และคำนวณดัชนีความน่าเชื่อถือ (เช่น SAIFI, SAIDI, CAIDI) พร้อมตัวเลือกใช้ Generative AI ในการขยายความ")
+            csv_in_sum = gr.File(label='Upload CSV for Overall Summary')
+            with gr.Row():
+                use_hf_sum = gr.Checkbox(label='Use Generative AI for Summary', value=False)
+                total_customers = gr.Number(label='Total Customers (for reliability calculation)', value=500000, precision=0)
+                run_sum = gr.Button('Generate Overall Summary')
+            with gr.Row():
+                model_selector_sum = gr.Dropdown(
+                    choices=[
+                        'meta-llama/Llama-3.1-8B-Instruct:novita',
+                        'meta-llama/Llama-4-Scout-17B-16E-Instruct:novita',
+                        'Qwen/Qwen3-VL-235B-A22B-Instruct:novita',
+                        'deepseek-ai/DeepSeek-R1:novita'
+                    ],
+                    value='meta-llama/Llama-3.1-8B-Instruct:novita',
+                    label='GenAI Model',
+                    interactive=True,
+                    visible=False
+                )
+            with gr.Tabs():
+                with gr.TabItem('AI Summary'):
+                    ai_summary_out = gr.Textbox(label='AI Generated Summary', lines=10)
+                with gr.TabItem('Basic Statistics'):
+                    basic_stats_out = gr.JSON(label='Basic Statistics')
+                with gr.TabItem('Reliability Indices'):
+                    reliability_out = gr.Dataframe(label='Reliability Metrics')
+            sum_status = gr.Textbox(label='Summary Status', interactive=False)
+            def run_overall_summary(file, use_hf_flag, total_cust, model):
+                if file is None:
+                    return {}, {}, pd.DataFrame(), 'No file provided'
+                try:
+                    from scripts.summary import summarize_overall
+                    df = pd.read_csv(file.name, dtype=str)
+                    result = summarize_overall(df, use_hf=use_hf_flag, model=model, total_customers=total_cust)
+                    # Prepare outputs
+                    ai_summary = result.get('ai_summary', 'ไม่สามารถสร้างสรุปด้วย AI ได้')
+                    basic_stats = {
+                        'total_events': result.get('total_events'),
+                        'date_range': result.get('date_range'),
+                        'event_types': result.get('event_types'),
+                        'total_affected_customers': result.get('total_affected_customers')
+                    }
+                    # Reliability metrics as DataFrame
+                    reliability_df = result.get('reliability_df', pd.DataFrame())
+                    status = f"Summary generated for {len(df)} events. AI used: {use_hf_flag}"
+                    return ai_summary, basic_stats, reliability_df, status
+                except Exception as e:
+                    return f"Error: {str(e)}", {}, pd.DataFrame(), f'Summary failed: {e}'
+            def update_model_visibility_sum(use_hf_flag):
+                return gr.update(visible=use_hf_flag, interactive=use_hf_flag)
+            use_hf_sum.change(fn=update_model_visibility_sum, inputs=use_hf_sum, outputs=model_selector_sum)
+            run_sum.click(fn=run_overall_summary, inputs=[csv_in_sum, use_hf_sum, total_customers, model_selector_sum], outputs=[ai_summary_out, basic_stats_out, reliability_out, sum_status])
         with gr.TabItem('Anomaly Detection'):
+            gr.Markdown("**Usecase Scenario — Anomaly Detection**: ตรวจจับเหตุการณ์ที่มีพฤติกรรมผิดปกติในชุดข้อมูล (เช่น เหตุการณ์ที่มีค่าสูง/ต่ำผิดปกติ) โดยใช้หลาย algorithm ปรับระดับ contamination และส่งออกผลลัพธ์พร้อมธงความผิดปกติ")
             csv_in_anom = gr.File(label='Upload CSV for Anomaly')
             with gr.Row():
                 alg = gr.Radio(choices=['both','iso','lof','autoencoder'], value='both', label='Algorithm')
             run_anom.click(fn=run_anomaly_ui, inputs=[csv_in_anom, alg, contamination], outputs=[anom_out, anom_status, anom_download])
         # Classification tab
         with gr.TabItem('Classification'):
+            gr.Markdown("**Usecase Scenario — Classification**: ฝึกและทดสอบโมเดลเพื่อจำแนกสาเหตุของเหตุการณ์ กำหนดคอลัมน์เป้าหมาย ปรับ hyperparameters, เปิดใช้งาน weak-labeling และดาวน์โหลดโมเดล/ผลการทำนาย")
             csv_in_cls = gr.File(label='Upload CSV for Classification')
             with gr.Row():
                 label_col = gr.Dropdown(choices=['CauseType','SubCauseType'], value='CauseType', label='Target Column')
         # Label Suggestion tab
         with gr.TabItem('Label Suggestion'):
+            gr.Markdown("**Usecase Scenario — Label Suggestion**: ให้คำแนะนำป้ายกำกับสาเหตุที่เป็นไปได้สำหรับเหตุการณ์ที่ไม่มีฉลาก โดยเทียบความคล้ายกับตัวอย่างที่มีฉลาก ปรับจำนวนคำแนะนำสูงสุด และส่งออกเป็นไฟล์ CSV")
             csv_in_ls = gr.File(label='Upload CSV (defaults to data/data_3.csv)')
             with gr.Row():
                 top_k = gr.Slider(minimum=1, maximum=5, value=1, step=1, label='Top K suggestions')
             run_ls.click(fn=run_label_suggestion, inputs=[csv_in_ls, top_k], outputs=[ls_out, ls_status, ls_download])
+        # Forecasting tab
+        with gr.TabItem('Forecasting'):
+            gr.Markdown("**Usecase Scenario — Forecasting**: พยากรณ์จำนวนเหตุการณ์หรือเวลาหยุดทำงานในอนาคตโดยเลือกโมเดล (Prophet, LSTM, Bi-LSTM, GRU, Naive) ปรับพารามิเตอร์ และส่งออกผลการพยากรณ์")
+            gr.Markdown("*Multivariate forecasting (ใช้หลายฟีเจอร์) รองรับเฉพาะโมเดล LSTM, Bi-LSTM, GRU เท่านั้น*")
+            csv_in_fc = gr.File(label='Upload CSV for Forecasting')
+            with gr.Row():
+                metric_fc = gr.Radio(choices=['count','downtime_minutes'], value='count', label='Metric to Forecast')
+                model_type_fc = gr.Radio(choices=['prophet','lstm','bilstm','gru','naive'], value='lstm', label='Forecasting Model', elem_id='forecast_model_radio')
+                periods_fc = gr.Slider(minimum=1, maximum=30, value=7, step=1, label='Forecast Periods (days)')
+                multivariate_fc = gr.Checkbox(value=False, label='Use Multivariate (Multiple Features)', interactive=False)
+                run_fc = gr.Button('Run Forecasting')
+            # Add state to track current model
+            current_model_state = gr.State(value='lstm')
+            def update_multivariate_visibility(model_choice):
+                # Multivariate is only supported for LSTM, Bi-LSTM, GRU
+                supported_models = ['lstm', 'bilstm', 'gru']
+                is_supported = model_choice in supported_models
+                return gr.update(interactive=is_supported, value=False)
+            def update_model_state(model_choice):
+                return model_choice
+            # Hyperparameter controls for forecasting
+            with gr.Accordion("Hyperparameters (Advanced)", open=False):
+                gr.Markdown("Adjust hyperparameters for the selected forecasting model. Defaults are set for good performance.")
+                # Prophet hyperparameters
+                prophet_changepoint_prior = gr.Slider(minimum=0.001, maximum=0.5, value=0.05, step=0.001, label="Prophet: changepoint_prior_scale", visible=False)
+                prophet_seasonality_prior = gr.Slider(minimum=0.01, maximum=10.0, value=10.0, step=0.1, label="Prophet: seasonality_prior_scale", visible=False)
+                prophet_seasonality_mode = gr.Radio(choices=['additive', 'multiplicative'], value='additive', label="Prophet: seasonality_mode", visible=False)
+                # Deep learning hyperparameters (LSTM, Bi-LSTM, GRU)
+                dl_seq_length = gr.Slider(minimum=3, maximum=30, value=7, step=1, label="DL: sequence_length (lag/input length)", visible=True)
+                dl_epochs = gr.Slider(minimum=10, maximum=200, value=100, step=10, label="DL: epochs", visible=True)
+                dl_batch_size = gr.Slider(minimum=4, maximum=64, value=16, step=4, label="DL: batch_size", visible=True)
+                dl_learning_rate = gr.Slider(minimum=0.0001, maximum=0.01, value=0.001, step=0.0001, label="DL: learning_rate", visible=True)
+                dl_units = gr.Slider(minimum=32, maximum=256, value=100, step=16, label="DL: units (neurons)", visible=True)
+                dl_dropout = gr.Slider(minimum=0.0, maximum=0.5, value=0.2, step=0.05, label="DL: dropout_rate", visible=True)
+                # Naive has no hyperparameters
+            def update_forecast_hyperparams_visibility(model_choice):
+                prophet_visible = model_choice == 'prophet'
+                dl_visible = model_choice in ['lstm', 'bilstm', 'gru']
+                return [
+                    gr.update(visible=prophet_visible),  # prophet_changepoint_prior
+                    gr.update(visible=prophet_visible),  # prophet_seasonality_prior
+                    gr.update(visible=prophet_visible),  # prophet_seasonality_mode
+                    gr.update(visible=dl_visible),       # dl_seq_length
+                    gr.update(visible=dl_visible),       # dl_epochs
+                    gr.update(visible=dl_visible),       # dl_batch_size
+                    gr.update(visible=dl_visible),       # dl_learning_rate
+                    gr.update(visible=dl_visible),       # dl_units
+                    gr.update(visible=dl_visible),       # dl_dropout
+                ]
+            with gr.Tabs():
+                with gr.TabItem('Historical Data'):
+                    hist_out = gr.Dataframe(label='Historical Time Series Data')
+                with gr.TabItem('Forecast Results'):
+                    fcst_out = gr.Dataframe(label='Forecast Results')
+                with gr.TabItem('Time Series Plot'):
+                    plot_out = gr.Plot(label='Historical + Forecast Plot')
+            fc_status = gr.Textbox(label='Forecast Status', interactive=False)
+            fc_download = gr.File(label='Download forecast CSV')
+            def run_forecast_ui(file, metric, model_type, periods, multivariate, current_model, prophet_cp, prophet_sp, prophet_sm, dl_sl, dl_e, dl_bs, dl_lr, dl_u, dl_d):
+                # Use current_model if available, otherwise use model_type
+                actual_model = current_model if current_model else model_type
+                if file is None:
+                    return pd.DataFrame(), pd.DataFrame(), None, 'No file provided', None
+                try:
+                    from scripts.forecast import run_forecast
+                    import matplotlib.pyplot as plt
+                    df = pd.read_csv(file.name, dtype=str)
+                    # Build hyperparams dict based on model type
+                    hyperparams = {}
+                    if actual_model == 'prophet':
+                        hyperparams = {
+                            'changepoint_prior_scale': prophet_cp,
+                            'seasonality_prior_scale': prophet_sp,
+                            'seasonality_mode': prophet_sm
+                        }
+                    elif actual_model in ['lstm', 'bilstm', 'gru']:
+                        hyperparams = {
+                            'seq_length': int(dl_sl),
+                            'epochs': int(dl_e),
+                            'batch_size': int(dl_bs),
+                            'learning_rate': dl_lr,
+                            'units': int(dl_u),
+                            'dropout_rate': dl_d
+                        }
+                    ts, fcst = run_forecast(df, metric=metric, periods=periods, model_type=actual_model, multivariate=multivariate, hyperparams=hyperparams)
+                    # Create time series plot
+                    fig, ax = plt.subplots(figsize=(14, 7))
+                    # Plot historical data
+                    if len(ts) > 0 and 'y' in ts.columns:
+                        ax.plot(ts['ds'], ts['y'], 'b-', label='Historical Data', linewidth=2, marker='o', markersize=4)
+                    # Plot forecast data
+                    if len(fcst) > 0 and 'yhat' in fcst.columns:
+                        ax.plot(fcst['ds'], fcst['yhat'], 'r--', label='Forecast', linewidth=3, marker='s', markersize=5)
+                        if 'yhat_lower' in fcst.columns and 'yhat_upper' in fcst.columns:
+                            ax.fill_between(fcst['ds'], fcst['yhat_lower'], fcst['yhat_upper'],
+                                          color='red', alpha=0.3, label='Confidence Interval')
+                    # Add vertical line to separate historical from forecast
+                    if len(ts) > 0 and len(fcst) > 0:
+                        last_hist_date = ts['ds'].max()
+                        ax.axvline(x=last_hist_date, color='gray', linestyle='--', alpha=0.7, label='Forecast Start')
+                    ax.set_title(f'Time Series Forecast: {model_type.upper()} ({metric.replace("_", " ").title()})',
+                               fontsize=16, fontweight='bold', pad=20)
+                    ax.set_xlabel('Date', fontsize=14)
+                    ax.set_ylabel(metric.replace('_', ' ').title(), fontsize=14)
+                    ax.legend(loc='upper left', fontsize=12)
+                    ax.grid(True, alpha=0.3)
+                    # Format x-axis dates
+                    import matplotlib.dates as mdates
+                    ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
+                    ax.xaxis.set_major_locator(mdates.DayLocator(interval=max(1, len(ts) // 10)))
+                    plt.xticks(rotation=45, ha='right')
+                    plt.tight_layout()
+                    # Save forecast results
+                    mode = 'multivariate' if multivariate else 'univariate'
+                    if multivariate and model_type not in ['lstm', 'bilstm', 'gru']:
+                        mode += ' (fallback: model does not support multivariate)'
+                    out_file = ROOT / 'outputs' / f'forecast_{metric}_{model_type}_{mode.replace(" ", "_")}.csv'
+                    out_file.parent.mkdir(exist_ok=True)
+                    fcst.to_csv(out_file, index=False)
+                    status = f"Forecasting completed using {model_type.upper()} ({mode}). Historical data: {len(ts)} days, Forecast: {len(fcst)} days."
+                    if multivariate and model_type not in ['lstm', 'bilstm', 'gru']:
+                        status += " Note: Model does not support multivariate - used univariate instead."
+                    return ts, fcst, fig, status, str(out_file)
+                except Exception as e:
+                    import matplotlib.pyplot as plt
+                    fig, ax = plt.subplots(figsize=(14, 7))
+                    ax.text(0.5, 0.5, f'Forecasting Error:\n{str(e)}',
+                           transform=ax.transAxes, ha='center', va='center',
+                           fontsize=14, bbox=dict(boxstyle="round,pad=0.3", facecolor="lightcoral"))
+                    ax.set_title('Time Series Forecast - Error Occurred', fontsize=16, fontweight='bold')
+                    ax.set_xlim(0, 1)
+                    ax.set_ylim(0, 1)
+                    plt.axis('off')
+                    return pd.DataFrame(), pd.DataFrame(), fig, f'Forecasting failed: {e}', None
+            model_type_fc.change(fn=update_multivariate_visibility, inputs=[model_type_fc], outputs=[multivariate_fc])
+            model_type_fc.change(fn=update_model_state, inputs=[model_type_fc], outputs=[current_model_state])
+            model_type_fc.change(fn=update_forecast_hyperparams_visibility, inputs=[model_type_fc], outputs=[prophet_changepoint_prior, prophet_seasonality_prior, prophet_seasonality_mode, dl_seq_length, dl_epochs, dl_batch_size, dl_learning_rate, dl_units, dl_dropout])
+            run_fc.click(fn=run_forecast_ui, inputs=[csv_in_fc, metric_fc, model_type_fc, periods_fc, multivariate_fc, current_model_state, prophet_changepoint_prior, prophet_seasonality_prior, prophet_seasonality_mode, dl_seq_length, dl_epochs, dl_batch_size, dl_learning_rate, dl_units, dl_dropout], outputs=[hist_out, fcst_out, plot_out, fc_status, fc_download])
 if __name__ == '__main__':
     demo.launch()

requirements.txt CHANGED Viewed

@@ -22,3 +22,4 @@ httpx==0.28.1
 orjson==3.11.3
 cmdstanpy==1.2.5
 stanio==0.5.1

 orjson==3.11.3
 cmdstanpy==1.2.5
 stanio==0.5.1
+cloudpickle==3.1.1

scripts/classify.py CHANGED Viewed

@@ -6,14 +6,14 @@ import numpy as np
 from typing import Optional
 # sklearn imports
-from sklearn.model_selection import train_test_split, StratifiedKFold
 from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
 from sklearn.neural_network import MLPClassifier
 from sklearn.pipeline import Pipeline
 from sklearn.compose import ColumnTransformer
 from sklearn.preprocessing import OneHotEncoder, StandardScaler, LabelEncoder
 from sklearn.impute import SimpleImputer
-from sklearn.metrics import classification_report
 import joblib
 # Optional HF weak-labeling
@@ -174,6 +174,8 @@ def train_classifier(df: pd.DataFrame, label_col: str = 'CauseType', test_size:
     # save model
     model_file = Path('outputs') / f'classifier_{model_type}_{label_col}.joblib'
     # predictions on train set for download
     y_pred_train = pipeline.predict(X)
@@ -187,260 +189,3 @@ def train_classifier(df: pd.DataFrame, label_col: str = 'CauseType', test_size:
         'model_file': str(model_file),
         'predictions_file': str(preds_file)
     }
-    df = parse_and_features(df)
-    is_multi = len(label_cols) > 1
-    # optionally weak-label rows missing label (only for single target)
-    if not is_multi and label_cols[0] not in df.columns:
-        df[label_cols[0]] = None
-    if not is_multi and df[label_cols[0]].isna().sum() > 0 and HF_TOKEN:
-        # attempt weak labeling for missing entries using Detail or FaultDetail
-        for idx, row in df[df[label_cols[0]].isna()].iterrows():
-            text = None
-            for f in ['Detail','FaultDetail','SiteDetail']:
-                if f in df.columns and pd.notna(row.get(f)):
-                    text = row.get(f)
-                    break
-            if text:
-                try:
-                    lbl = weak_label_with_hf(text)
-                    if lbl:
-                        df.at[idx, label_cols[0]] = lbl
-                except Exception:
-                    pass
-    # filter rare classes and drop na (for each label_col)
-    for col in label_cols:
-        if col not in df.columns:
-            df[col] = None
-        if df[col].notna().any():
-            vc = df[col].value_counts()
-            rare = vc[vc < min_count_to_keep].index
-            if len(rare) > 0:
-                df[col] = df[col].apply(lambda x: 'Other' if x in rare else x)
-        df = df.dropna(subset=[col])
-    # features
-    feature_cols = ['duration_min','Load(MW)_num','Capacity(kVA)_num','AffectedCustomer_num','hour','weekday','device_freq','OpDeviceType','Owner','Weather','EventType']
-    X = df[feature_cols]
-    # target
-    if is_multi:
-        y = df[label_cols]
-        # encode each target
-        les = [LabelEncoder() for _ in label_cols]
-        y_encoded = np.column_stack([le.fit_transform(y[col]) for le, col in zip(les, label_cols)])
-    else:
-        y = df[label_cols[0]].astype(str)
-        le = LabelEncoder()
-        y_encoded = le.fit_transform(y)
-        les = [le]
-    # split
-    X_train, X_test, y_train, y_test = train_test_split(X, y_encoded, test_size=test_size, random_state=random_state, stratify=y_encoded if not is_multi else None)
-    # model
-    if model_type == 'rf':
-        clf = RandomForestClassifier(random_state=random_state)
-    elif model_type == 'gb':
-        clf = GradientBoostingClassifier(random_state=random_state)
-    elif model_type == 'mlp':
-        clf = MLPClassifier(random_state=random_state, max_iter=500)
-    else:
-        raise ValueError(f"Unknown model_type: {model_type}")
-    # preprocessor
-    preprocessor = ColumnTransformer(
-        transformers=[
-            ('num', Pipeline([('imputer', SimpleImputer(strategy='median')), ('scaler', StandardScaler())]), ['duration_min','Load(MW)_num','Capacity(kVA)_num','AffectedCustomer_num','hour','weekday','device_freq']),
-            ('cat', Pipeline([('imputer', SimpleImputer(strategy='most_frequent')), ('encoder', OneHotEncoder(handle_unknown='ignore'))]), ['OpDeviceType','Owner','Weather','EventType'])
-        ]
-    )
-    pipeline = Pipeline([('preprocessor', preprocessor), ('classifier', clf)])
-    if do_gridsearch:
-        param_grid = {
-            'classifier__n_estimators': [50, 100, 200] if hasattr(clf, 'n_estimators') else [1],
-            'classifier__max_depth': [None, 10, 20] if hasattr(clf, 'max_depth') else [1],
-        }
-        cv = 3 if not is_multi else KFold(n_splits=3, shuffle=True, random_state=random_state)
-        scoring = 'accuracy' if not is_multi else None
-        grid = GridSearchCV(pipeline, param_grid, cv=cv, scoring=scoring, n_jobs=-1)
-        grid.fit(X_train, y_train)
-        pipeline = grid.best_estimator_
-    pipeline.fit(X_train, y_train)
-    # predict
-    y_pred = pipeline.predict(X_test)
-    # report
-    if is_multi:
-        reports = []
-        for i, col in enumerate(label_cols):
-            y_test_i = y_test[:, i]
-            y_pred_i = y_pred[:, i]
-            y_test_inv = les[i].inverse_transform(y_test_i)
-            y_pred_inv = les[i].inverse_transform(y_pred_i.astype(int))
-            rep = classification_report(y_test_inv, y_pred_inv, zero_division=0)
-            reports.append(f"Report for {col}:\n{rep}")
-        report = '\n\n'.join(reports)
-    else:
-        y_test_inv = les[0].inverse_transform(y_test)
-        y_pred_inv = les[0].inverse_transform(y_pred)
-        report = classification_report(y_test_inv, y_pred_inv, zero_division=0)
-    # save model
-    model_file = Path('outputs') / f'classifier_{model_type}_{"_".join(label_cols)}.joblib'
-    model_file.parent.mkdir(exist_ok=True)
-    joblib.dump({'pipeline': pipeline, 'label_encoders': les}, model_file)
-    # predictions on train set for download
-    y_pred_train = pipeline.predict(X)
-    if is_multi:
-        pred_df = df.copy()
-        for i, col in enumerate(label_cols):
-            pred_df[f'Predicted_{col}'] = les[i].inverse_transform(y_pred_train[:, i].astype(int))
-    else:
-        pred_df = df.copy()
-        pred_df[f'Predicted_{label_cols[0]}'] = les[0].inverse_transform(y_pred_train)
-    preds_file = Path('outputs') / f'predictions_{model_type}_{"_".join(label_cols)}.csv'
-    pred_df.to_csv(preds_file, index=False)
-    return {
-        'report': report,
-        'model_file': str(model_file),
-        'predictions_file': str(preds_file)
-    }
-    df = parse_and_features(df)
-    # optionally weak-label rows missing label
-    if label_col not in df.columns:
-        df[label_col] = None
-    if df[label_col].isna().sum() > 0 and HF_TOKEN:
-        # attempt weak labeling for missing entries using Detail or FaultDetail
-        for idx, row in df[df[label_col].isna()].iterrows():
-            text = None
-            for f in ['Detail','FaultDetail','SiteDetail']:
-                if f in df.columns and pd.notna(row.get(f)):
-                    text = row.get(f)
-                    break
-            if text:
-                lbl = weak_label_with_hf(text)
-                if lbl:
-                    df.at[idx, label_col] = lbl
-    # combine rare classes into 'Other' if needed
-    if df[label_col].notna().any():
-        vc = df[label_col].value_counts()
-        rare = vc[vc < min_count_to_keep].index.tolist()
-        if rare:
-            df[label_col] = df[label_col].apply(lambda x: 'Other' if x in rare else x)
-    df = df.dropna(subset=[label_col])
-    if df.empty:
-        raise ValueError('No labeled data available for training')
-    # define features
-    feature_cols = ['duration_min','Load(MW)_num','Capacity(kVA)_num','AffectedCustomer_num','hour','weekday','device_freq','OpDeviceType','Owner','Weather','EventType']
-    X = df[feature_cols]
-    y = df[label_col].astype(str)
-    # encode labels to integers
-    le = LabelEncoder()
-    y_encoded = le.fit_transform(y)
-    # simple train/test split
-    X_train, X_test, y_train, y_test = train_test_split(X, y_encoded, test_size=test_size, random_state=random_state, stratify=y_encoded)
-    # preprocessing
-    numeric_feats = ['duration_min','Load(MW)_num','Capacity(kVA)_num','AffectedCustomer_num','hour','weekday','device_freq']
-    cat_feats = ['OpDeviceType','Owner','Weather','EventType']
-    numeric_transformer = Pipeline(steps=[('imputer', SimpleImputer(strategy='median')), ('scaler', StandardScaler())])
-    # sklearn versions differ on parameter name for sparse output
-    try:
-        cat_transformer = OneHotEncoder(handle_unknown='ignore', sparse_output=False)
-    except TypeError:
-        cat_transformer = OneHotEncoder(handle_unknown='ignore', sparse=False)
-    preprocessor = ColumnTransformer(transformers=[
-        ('num', numeric_transformer, numeric_feats),
-        ('cat', cat_transformer, cat_feats)
-    ], remainder='drop')
-    # choose classifier
-    model_type = (model_type or 'rf').lower()
-    if model_type == 'rf':
-        clf_est = RandomForestClassifier(class_weight='balanced', random_state=random_state)
-        clf_name = 'rf'
-    elif model_type == 'gb':
-        clf_est = GradientBoostingClassifier(random_state=random_state)
-        clf_name = 'gb'
-    elif model_type == 'mlp':
-        clf_est = MLPClassifier(hidden_layer_sizes=(100,), max_iter=300, random_state=random_state)
-        clf_name = 'mlp'
-    else:
-        raise ValueError(f'Unknown model_type: {model_type}')
-    clf = Pipeline(steps=[('pre', preprocessor), ('clf', clf_est)])
-    if do_gridsearch:
-        if clf_name == 'rf':
-            param_grid = {
-                'clf__n_estimators': [100,200],
-                'clf__max_depth': [None, 10, 20],
-                'clf__min_samples_split': [2,5]
-            }
-        elif clf_name == 'lgb':
-            param_grid = {
-                'clf__n_estimators': [100,200],
-                'clf__num_leaves': [31,63]
-            }
-        elif clf_name == 'gb':
-            param_grid = {
-                'clf__n_estimators': [100,200],
-                'clf__max_depth': [3,6]
-            }
-        elif clf_name == 'mlp':
-            param_grid = {
-                'clf__hidden_layer_sizes': [(50,),(100,)],
-                'clf__alpha': [0.0001, 0.001]
-            }
-        else:
-            param_grid = {}
-        cv = StratifiedKFold(n_splits=3, shuffle=True, random_state=random_state)
-        gs = GridSearchCV(clf, param_grid, cv=cv, scoring='f1_weighted', n_jobs=1)
-        gs.fit(X_train, y_train)
-        best = gs.best_estimator_
-        best_params = gs.best_params_
-        model_to_save = best
-    else:
-        clf.fit(X_train, y_train)
-        best_params = None
-        model_to_save = clf
-    y_pred = model_to_save.predict(X_test)
-    unique_labels = np.unique(np.concatenate([y_test, y_pred]))
-    target_names = [le.classes_[i] for i in unique_labels]
-    report = classification_report(y_test, y_pred, target_names=target_names, zero_division=0)
-    cm = confusion_matrix(y_test, y_pred)
-    # save model pipeline
-    out_dir = Path.cwd() / 'outputs'
-    out_dir.mkdir(exist_ok=True)
-    model_file = out_dir / f'{clf_name}_cause_pipeline.joblib'
-    joblib.dump({'pipeline': model_to_save, 'label_encoder': le}, model_file)
-    # save predictions
-    pred_df = X_test.copy()
-    pred_df['y_true'] = le.inverse_transform(y_test)
-    pred_df['y_pred'] = le.inverse_transform(y_pred)
-    pred_df.to_csv(out_dir / 'predictions_cause.csv', index=False, encoding='utf-8-sig')
-    return {'model_file': str(model_file), 'report': report, 'confusion_matrix': cm, 'predictions_file': str(out_dir / 'predictions_cause.csv'), 'best_params': best_params}

 from typing import Optional
 # sklearn imports
+from sklearn.model_selection import train_test_split, StratifiedKFold, GridSearchCV
 from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
 from sklearn.neural_network import MLPClassifier
 from sklearn.pipeline import Pipeline
 from sklearn.compose import ColumnTransformer
 from sklearn.preprocessing import OneHotEncoder, StandardScaler, LabelEncoder
 from sklearn.impute import SimpleImputer
+from sklearn.metrics import classification_report, confusion_matrix
 import joblib
 # Optional HF weak-labeling
     # save model
     model_file = Path('outputs') / f'classifier_{model_type}_{label_col}.joblib'
+    model_file.parent.mkdir(exist_ok=True)
+    joblib.dump({'pipeline': pipeline, 'label_encoder': le}, model_file)
     # predictions on train set for download
     y_pred_train = pipeline.predict(X)
         'model_file': str(model_file),
         'predictions_file': str(preds_file)
     }

scripts/compute_reliability.py ADDED Viewed

	@@ -0,0 +1,242 @@

+"""Compute reliability indices (SAIFI, SAIDI, CAIDI, MAIFI) from outage event CSV.
+Usage (programmatic):
+from scripts.compute_reliability import compute_reliability
+summary = compute_reliability('data/data_1.csv', total_customers=500000)
+Usage (CLI):
+python scripts/compute_reliability.py --input data/data_1.csv --total-customers 500000
+Assumptions & mapping (from inspected CSV):
+- Outage start: `OutageDateTime`
+- Outage end: prefer `CloseEventDateTime`, else `LastRestoDateTime`, else `FirstRestoDateTime`
+- Customers affected: prefer `AffectedCustomer` column; else sum `AffectedCustomer1..5`; else `AllStepCusXTime` or `AllStepCusXTime1..5` fallback.
+- Planned outages: rows with `EventType` containing 'แผน' (e.g., 'แผนดับไฟ') are considered planned and can be excluded by default.
+- Date format is day-first like '10-01-2025 10:28:00'.
+Outputs saved to `outputs/reliability_summary.csv` and breakdown CSVs.
+"""
+from __future__ import annotations
+import argparse
+from typing import Optional, Dict
+import pandas as pd
+import numpy as np
+from pathlib import Path
+DATE_COLS = ['OutageDateTime', 'FirstRestoDateTime', 'LastRestoDateTime', 'CreateEventDateTime', 'CloseEventDateTime']
+def parse_dates(df: pd.DataFrame) -> pd.DataFrame:
+    for c in DATE_COLS:
+        if c in df.columns:
+            # many dates are in format dd-mm-YYYY HH:MM:SS
+            df[c] = pd.to_datetime(df[c], dayfirst=True, errors='coerce')
+    return df
+def coalesce_end_time(row: pd.Series) -> pd.Timestamp | None:
+    for c in ('CloseEventDateTime', 'LastRestoDateTime', 'FirstRestoDateTime', 'CreateEventDateTime'):
+        if c in row and pd.notna(row[c]):
+            return row[c]
+    return pd.NaT
+def estimate_customers(row: pd.Series) -> float:
+    # Prefer AffectedCustomer if present and numeric
+    def to_num(x):
+        try:
+            if pd.isna(x) or x == '':
+                return np.nan
+            return float(x)
+        except Exception:
+            return np.nan
+    cols = row.index
+    # Try AffectedCustomer
+    if 'AffectedCustomer' in cols:
+        v = to_num(row['AffectedCustomer'])
+        if not np.isnan(v):
+            return v
+    # Sum AffectedCustomer1..5
+    acs = []
+    for i in range(1, 6):
+        k = f'AffectedCustomer{i}'
+        if k in cols:
+            acs.append(to_num(row[k]))
+    acs = [x for x in acs if not np.isnan(x)]
+    if acs:
+        return float(sum(acs))
+    # Try AllStepCusXTime or AllStepCusXTime1..5
+    if 'AllStepCusXTime' in cols:
+        v = to_num(row['AllStepCusXTime'])
+        if not np.isnan(v):
+            return v
+    asts = []
+    for i in range(1, 6):
+        k = f'AllStepCusXTime{i}'
+        if k in cols:
+            asts.append(to_num(row[k]))
+    asts = [x for x in asts if not np.isnan(x)]
+    if asts:
+        return float(sum(asts))
+    # As last resort, try numeric columns near end: Capacity(kVA) or Load(MW) are not customer counts
+    return np.nan
+def flag_planned(event_type: Optional[str]) -> bool:
+    if pd.isna(event_type):
+        return False
+    s = str(event_type)
+    # In this dataset planned outages use Thai word 'แผน'
+    if 'แผน' in s:
+        return True
+    # else treat as unplanned
+    return False
+def compute_reliability(
+    input_csv: str | Path,
+    total_customers: Optional[float] = None,
+    customers_map: Optional[Dict[str, float]] = None,
+    exclude_planned: bool = True,
+    momentary_threshold_min: float = 1.0,
+    groupby_cols: list[str] | None = None,
+    out_dir: str | Path | None = 'outputs',
+) -> Dict[str, pd.DataFrame]:
+    """Reads CSV and computes reliability indices.
+    Returns a dict of DataFrames: overall, by_group.
+    """
+    input_csv = Path(input_csv)
+    out_dir = Path(out_dir)
+    out_dir.mkdir(parents=True, exist_ok=True)
+    df = pd.read_csv(input_csv, dtype=str)
+    # parse dates
+    df = parse_dates(df)
+    # coalesce end time
+    df['OutageStart'] = df.get('OutageDateTime')
+    df['OutageEnd'] = df.apply(coalesce_end_time, axis=1)
+    # compute duration in minutes
+    df['DurationMin'] = (pd.to_datetime(df['OutageEnd']) - pd.to_datetime(df['OutageStart'])).dt.total_seconds() / 60.0
+    # customers affected
+    df['CustomersAffected'] = df.apply(estimate_customers, axis=1)
+    # flag planned
+    df['IsPlanned'] = df['EventType'].apply(flag_planned) if 'EventType' in df.columns else False
+    if exclude_planned:
+        df_work = df[~df['IsPlanned']].copy()
+    else:
+        df_work = df.copy()
+    # Fill missing durations (negative or NaN) with 0
+    df_work['DurationMin'] = df_work['DurationMin'].fillna(0)
+    df_work.loc[df_work['DurationMin'] < 0, 'DurationMin'] = 0
+    # ensure numeric customers
+    df_work['CustomersAffected'] = pd.to_numeric(df_work['CustomersAffected'], errors='coerce').fillna(0)
+    # Choose grouping
+    if groupby_cols is None:
+        groupby_cols = []
+    # helper to compute indices given total customers
+    def compute_from_df(dfall: pd.DataFrame, cust_total: float) -> Dict[str, float]:
+        total_interruptions = dfall['CustomersAffected'].sum()
+        total_customer_minutes = (dfall['CustomersAffected'] * dfall['DurationMin']).sum()
+        # momentary interruptions: durations less than threshold
+        momentary_interruptions = dfall.loc[dfall['DurationMin'] < momentary_threshold_min, 'CustomersAffected'].sum()
+        saifi = total_interruptions / cust_total if cust_total and cust_total > 0 else np.nan
+        saidi = total_customer_minutes / cust_total if cust_total and cust_total > 0 else np.nan
+        caidi = (saidi / saifi) if (saifi and saifi > 0) else np.nan
+        maifi = momentary_interruptions / cust_total if cust_total and cust_total > 0 else np.nan
+        return {
+            'TotalInterruptions': total_interruptions,
+            'TotalCustomerMinutes': total_customer_minutes,
+            'MomentaryInterruptions': momentary_interruptions,
+            'SAIFI': saifi,
+            'SAIDI': saidi,
+            'CAIDI': caidi,
+            'MAIFI': maifi,
+        }
+    results = {}
+    if customers_map is not None:
+        # customers_map expects keys matching grouping (e.g., Feeder or AffectedAreaID). We'll compute per key
+        # Overall must supply a 'TOTAL' or we sum map values
+        total_customers_map_sum = sum(customers_map.values())
+        overall = compute_from_df(df_work, total_customers_map_sum if total_customers is None else total_customers)
+        results['overall'] = pd.DataFrame([overall])
+        # per-group
+        if groupby_cols:
+            group = df_work.groupby(groupby_cols).agg({'CustomersAffected': 'sum', 'DurationMin': 'mean'})
+        else:
+            # if no group col provided, try Feeder then AffectedAreaID
+            if 'Feeder' in df_work.columns:
+                groupby_cols = ['Feeder']
+            elif 'AffectedAreaID' in df_work.columns:
+                groupby_cols = ['AffectedAreaID']
+            else:
+                groupby_cols = []
+        if groupby_cols:
+            rows = []
+            for key, sub in df_work.groupby(groupby_cols):
+                # key can be tuple
+                keyname = key if isinstance(key, str) else '_'.join(map(str, key))
+                cust = customers_map.get(keyname, np.nan)
+                metrics = compute_from_df(sub, cust if not np.isnan(cust) else np.nan)
+                metrics.update({'Group': keyname})
+                rows.append(metrics)
+            results['by_group'] = pd.DataFrame(rows)
+        else:
+            results['by_group'] = pd.DataFrame()
+    else:
+        # customers_map not provided: require total_customers
+        if total_customers is None:
+            raise ValueError('Either total_customers or customers_map must be provided to compute per-customer indices')
+        overall = compute_from_df(df_work, float(total_customers))
+        results['overall'] = pd.DataFrame([overall])
+        # per-group breakdowns (SAIFI-like per 1000 customers will use proportion of total customers by share)
+        if groupby_cols:
+            rows = []
+            # If we don't have customers per group, we will compute interruption counts and durations but can't compute per-customer normalized indices without providing customers_map.
+            for key, sub in df_work.groupby(groupby_cols):
+                keyname = key if isinstance(key, str) else '_'.join(map(str, key))
+                rows.append({
+                    'Group': keyname,
+                    'TotalInterruptions': sub['CustomersAffected'].sum(),
+                    'TotalCustomerMinutes': (sub['CustomersAffected'] * sub['DurationMin']).sum(),
+                    'Events': len(sub),
+                })
+            results['by_group'] = pd.DataFrame(rows)
+        else:
+            results['by_group'] = pd.DataFrame()
+    # Save CSVs
+    results['raw'] = df_work
+    results['raw'].to_csv(out_dir / 'events_cleaned.csv', index=False)
+    results['overall'].to_csv(out_dir / 'reliability_overall.csv', index=False)
+    if 'by_group' in results:
+        results['by_group'].to_csv(out_dir / 'reliability_by_group.csv', index=False)
+    return results
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--input', '-i', required=True, help='Input CSV file')
+    parser.add_argument('--total-customers', type=float, help='Total customers served in the system (required if no customers map)')
+    parser.add_argument('--exclude-planned', action='store_true', help='Exclude planned outages (default True)')
+    parser.add_argument('--momentary-threshold-min', type=float, default=1.0, help='Threshold in minutes for momentary interruption')
+    parser.add_argument('--groupby', nargs='*', default=['Feeder'], help='Columns to group by for breakdown (default: Feeder)')
+    args = parser.parse_args()
+    res = compute_reliability(args.input, total_customers=args.total_customers, exclude_planned=args.exclude_planned, momentary_threshold_min=args.momentary_threshold_min, groupby_cols=args.groupby)
+    print('Wrote outputs to outputs/ (events_cleaned.csv, reliability_overall.csv, reliability_by_group.csv)')

scripts/forecast.py CHANGED Viewed

@@ -9,8 +9,160 @@ try:
 except Exception:
     PROPHET_AVAILABLE = False
 def prepare_timeseries(df: pd.DataFrame, date_col: str = 'OutageDateTime', metric: str = 'count') -> pd.DataFrame:
     # date_col is in format DD-MM-YYYY HH:MM:SS
     df = df.copy()
     df['dt'] = pd.to_datetime(df[date_col], format='%d-%m-%Y %H:%M:%S', errors='coerce')
@@ -29,10 +181,23 @@ def prepare_timeseries(df: pd.DataFrame, date_col: str = 'OutageDateTime', metri
     return ts
-def forecast_prophet(ts: pd.DataFrame, periods: int = 7, freq: str = 'D') -> pd.DataFrame:
     if not PROPHET_AVAILABLE:
         raise RuntimeError('Prophet not available')
-    m = Prophet()
     m.fit(ts)
     future = m.make_future_dataframe(periods=periods, freq=freq)
     fcst = m.predict(future)
@@ -47,13 +212,1220 @@ def forecast_naive(ts: pd.DataFrame, periods: int = 7) -> pd.DataFrame:
     return pd.DataFrame({'ds': future_dates, 'yhat': [last_mean]*periods, 'yhat_lower':[np.nan]*periods, 'yhat_upper':[np.nan]*periods})
-def run_forecast(df: pd.DataFrame, metric: str = 'count', periods: int = 7):
-    ts = prepare_timeseries(df, metric=metric)
-    if PROPHET_AVAILABLE and len(ts) >= 14:
-        try:
-            fcst = forecast_prophet(ts, periods=periods)
-            return ts, fcst
-        except Exception:
-            warnings.warn('Prophet failed, falling back to naive')
-    fcst = forecast_naive(ts, periods=periods)
-    return ts, fcst

 except Exception:
     PROPHET_AVAILABLE = False
+try:
+    import tensorflow as tf
+    from tensorflow.keras.models import Sequential
+    from tensorflow.keras.layers import LSTM, Bidirectional, GRU, Dense, Dropout
+    from sklearn.preprocessing import MinMaxScaler
+    TF_AVAILABLE = True
+except Exception:
+    TF_AVAILABLE = False
+def prepare_multivariate_timeseries(df: pd.DataFrame, date_col: str = 'OutageDateTime') -> pd.DataFrame:
+    """Prepare multivariate time series with multiple features"""
+    df = df.copy()
+    df['dt'] = pd.to_datetime(df[date_col], format='%d-%m-%Y %H:%M:%S', errors='coerce')
+    df = df.dropna(subset=['dt'])
+    df['day'] = df['dt'].dt.floor('D')
+    # Aggregate daily data
+    daily_data = df.groupby('day').agg({
+        'EventNumber': 'count',  # daily count
+        'Load(MW)': lambda x: pd.to_numeric(x, errors='coerce').mean(),
+        'Capacity(kVA)': lambda x: pd.to_numeric(x, errors='coerce').mean(),
+        'AffectedCustomer': lambda x: pd.to_numeric(x, errors='coerce').sum(),
+        'OpDeviceType': lambda x: x.mode().iloc[0] if len(x) > 0 else 'Unknown',
+        'Owner': lambda x: x.mode().iloc[0] if len(x) > 0 else 'Unknown',
+        'Weather': lambda x: x.mode().iloc[0] if len(x) > 0 else 'Unknown',
+        'EventType': lambda x: x.mode().iloc[0] if len(x) > 0 else 'Unknown'
+    }).reset_index()
+    # Rename columns
+    daily_data = daily_data.rename(columns={
+        'day': 'ds',
+        'EventNumber': 'daily_count',
+        'Load(MW)': 'avg_load_mw',
+        'Capacity(kVA)': 'avg_capacity_kva',
+        'AffectedCustomer': 'total_affected_customers'
+    })
+    # Calculate duration if available
+    if 'LastRestoDateTime' in df.columns:
+        df['last_dt'] = pd.to_datetime(df.get('LastRestoDateTime'), format='%d-%m-%Y %H:%M:%S', errors='coerce')
+        df['duration_min'] = (df['last_dt'] - df['dt']).dt.total_seconds() / 60.0
+        duration_agg = df.groupby('day')['duration_min'].sum().reset_index()
+        duration_agg = duration_agg.rename(columns={'day': 'ds', 'duration_min': 'total_downtime_min'})
+        daily_data = daily_data.merge(duration_agg, on='ds', how='left')
+        daily_data['total_downtime_min'] = daily_data['total_downtime_min'].fillna(0)
+    else:
+        daily_data['total_downtime_min'] = 0
+    # Add time features
+    daily_data['ds'] = pd.to_datetime(daily_data['ds'])
+    daily_data['day_of_week'] = daily_data['ds'].dt.dayofweek
+    daily_data['month'] = daily_data['ds'].dt.month
+    daily_data['is_weekend'] = daily_data['day_of_week'].isin([5, 6]).astype(int)
+    # Fill missing numeric values
+    numeric_cols = ['avg_load_mw', 'avg_capacity_kva', 'total_affected_customers', 'total_downtime_min']
+    for col in numeric_cols:
+        daily_data[col] = pd.to_numeric(daily_data[col], errors='coerce').fillna(daily_data[col].mean())
+    # Encode categorical variables
+    categorical_cols = ['OpDeviceType', 'Owner', 'Weather', 'EventType']
+    for col in categorical_cols:
+        daily_data[col] = daily_data[col].fillna('Unknown')
+        # Simple frequency encoding
+        freq_map = daily_data[col].value_counts().to_dict()
+        daily_data[f'{col}_freq'] = daily_data[col].map(freq_map)
+    return daily_data
+def prepare_multivariate_timeseries(df: pd.DataFrame, date_col: str = 'OutageDateTime', target_metric: str = 'count') -> pd.DataFrame:
+    """
+    Prepare multivariate time series data with multiple features per day.
+    Args:
+        df: Input dataframe
+        date_col: Date column name
+        target_metric: Target metric ('count' or 'downtime_minutes')
+    Returns:
+        DataFrame with daily aggregated features
+    """
+    df = df.copy()
+    # Convert data types properly
+    df['dt'] = pd.to_datetime(df[date_col], format='%d-%m-%Y %H:%M:%S', errors='coerce')
+    df = df.dropna(subset=['dt'])
+    df['day'] = df['dt'].dt.floor('D')
+    # Convert numeric columns
+    numeric_cols = ['Load(MW)', 'Capacity(kVA)', 'AffectedCustomer', 'FirstStepDuration', 'LastStepDuration']
+    for col in numeric_cols:
+        if col in df.columns:
+            df[col] = pd.to_numeric(df[col], errors='coerce')
+    # Target variable
+    if target_metric == 'count':
+        daily_data = df.groupby('day').size().rename('daily_count').reset_index()
+    elif target_metric == 'downtime_minutes':
+        df['last_dt'] = pd.to_datetime(df.get('LastRestoDateTime'), format='%d-%m-%Y %H:%M:%S', errors='coerce')
+        df['duration_min'] = (df['last_dt'] - df['dt']).dt.total_seconds() / 60.0
+        daily_data = df.groupby('day')['duration_min'].sum().rename('total_downtime_min').reset_index()
+    else:
+        raise ValueError('Unsupported target_metric')
+    # Additional features - aggregate per day
+    # Numeric features
+    numeric_agg = df.groupby('day').agg({
+        'Load(MW)': 'mean',
+        'Capacity(kVA)': 'mean',
+        'AffectedCustomer': 'sum',
+        'FirstStepDuration': 'mean',
+        'LastStepDuration': 'mean'
+    }).reset_index()
+    # Time features
+    time_features = df.groupby('day').agg({
+        'dt': ['count', lambda x: x.dt.hour.mean(), lambda x: x.dt.weekday.mean()]
+    }).reset_index()
+    time_features.columns = ['day', 'event_count', 'avg_hour', 'avg_weekday']
+    # Categorical features - take most common per day
+    categorical_features = df.groupby('day').agg({
+        'OpDeviceType': lambda x: x.mode().iloc[0] if len(x.mode()) > 0 else 'Unknown',
+        'Owner': lambda x: x.mode().iloc[0] if len(x.mode()) > 0 else 'Unknown',
+        'Weather': lambda x: x.mode().iloc[0] if len(x.mode()) > 0 else 'Unknown',
+        'EventType': lambda x: x.mode().iloc[0] if len(x.mode()) > 0 else 'Unknown'
+    }).reset_index()
+    # Merge all features
+    daily_data = daily_data.merge(numeric_agg, on='day', how='left')
+    daily_data = daily_data.merge(time_features, on='day', how='left')
+    daily_data = daily_data.merge(categorical_features, on='day', how='left')
+    # Fill missing values
+    daily_data = daily_data.fillna({
+        'Load(MW)': daily_data['Load(MW)'].mean(),
+        'Capacity(kVA)': daily_data['Capacity(kVA)'].mean(),
+        'AffectedCustomer': 0,
+        'FirstStepDuration': daily_data['FirstStepDuration'].mean(),
+        'LastStepDuration': daily_data['LastStepDuration'].mean(),
+        'avg_hour': 12,
+        'avg_weekday': 3
+    })
+    # Rename day column to ds for consistency
+    daily_data = daily_data.rename(columns={'day': 'ds'})
+    return daily_data
 def prepare_timeseries(df: pd.DataFrame, date_col: str = 'OutageDateTime', metric: str = 'count') -> pd.DataFrame:
+    """Prepare univariate time series data (original function for backward compatibility)"""
     # date_col is in format DD-MM-YYYY HH:MM:SS
     df = df.copy()
     df['dt'] = pd.to_datetime(df[date_col], format='%d-%m-%Y %H:%M:%S', errors='coerce')
     return ts
+def forecast_prophet(ts: pd.DataFrame, periods: int = 7, freq: str = 'D', hyperparams: dict = None) -> pd.DataFrame:
     if not PROPHET_AVAILABLE:
         raise RuntimeError('Prophet not available')
+    # Set default hyperparameters
+    if hyperparams is None:
+        hyperparams = {}
+    changepoint_prior_scale = hyperparams.get('changepoint_prior_scale', 0.05)
+    seasonality_prior_scale = hyperparams.get('seasonality_prior_scale', 10.0)
+    seasonality_mode = hyperparams.get('seasonality_mode', 'additive')
+    m = Prophet(
+        changepoint_prior_scale=changepoint_prior_scale,
+        seasonality_prior_scale=seasonality_prior_scale,
+        seasonality_mode=seasonality_mode
+    )
     m.fit(ts)
     future = m.make_future_dataframe(periods=periods, freq=freq)
     fcst = m.predict(future)
     return pd.DataFrame({'ds': future_dates, 'yhat': [last_mean]*periods, 'yhat_lower':[np.nan]*periods, 'yhat_upper':[np.nan]*periods})
+def create_sequences(data, seq_length):
+    """Create sequences for time series forecasting"""
+    X, y = [], []
+    for i in range(len(data) - seq_length):
+        X.append(data[i:(i + seq_length)])
+        y.append(data[i + seq_length])
+    return np.array(X), np.array(y)
+def forecast_lstm(ts: pd.DataFrame, periods: int = 7, seq_length: int = 7, hyperparams: dict = None) -> pd.DataFrame:
+    """Forecast using LSTM model"""
+    if not TF_AVAILABLE:
+        raise RuntimeError('TensorFlow not available')
+    # Set default hyperparameters
+    if hyperparams is None:
+        hyperparams = {}
+    seq_length = hyperparams.get('seq_length', seq_length)  # Use hyperparams seq_length if provided
+    epochs = hyperparams.get('epochs', 100)
+    batch_size = hyperparams.get('batch_size', 16)
+    learning_rate = hyperparams.get('learning_rate', 0.001)
+    units = hyperparams.get('units', 100)
+    dropout_rate = hyperparams.get('dropout_rate', 0.2)
+    # Prepare data
+    data = ts['y'].values.reshape(-1, 1)
+    scaler = MinMaxScaler(feature_range=(0, 1))
+    scaled_data = scaler.fit_transform(data)
+    # Create sequences
+    X, y = create_sequences(scaled_data, seq_length)
+    if len(X) < 10:  # Not enough data
+        return forecast_naive(ts, periods)
+    # Split data
+    train_size = int(len(X) * 0.8)
+    X_train, X_test = X[:train_size], X[train_size:]
+    y_train, y_test = y[:train_size], y[train_size:]
+    # Reshape for LSTM [samples, time steps, features]
+    X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))
+    X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], 1))
+    # Build LSTM model
+    model = Sequential([
+        LSTM(units, activation='relu', return_sequences=True, input_shape=(seq_length, 1)),
+        Dropout(dropout_rate),
+        LSTM(units//2, activation='relu'),
+        Dropout(dropout_rate),
+        Dense(1)
+    ])
+    optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)
+    model.compile(optimizer=optimizer, loss='mse')
+    # Train model
+    model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, verbose=0, validation_data=(X_test, y_test))
+    # Make predictions
+    predictions = []
+    current_sequence = scaled_data[-seq_length:].reshape(1, seq_length, 1)
+    for _ in range(periods):
+        pred = model.predict(current_sequence, verbose=0)
+        predictions.append(pred[0][0])
+        # Update sequence for next prediction
+        current_sequence = np.roll(current_sequence, -1, axis=1)
+        current_sequence[0, -1, 0] = pred[0][0]
+    # Inverse transform predictions
+    predictions = scaler.inverse_transform(np.array(predictions).reshape(-1, 1)).flatten()
+    # Create forecast dataframe
+    last_date = ts['ds'].max()
+    future_dates = pd.date_range(start=last_date + pd.Timedelta(days=1), periods=periods, freq='D')
+    return pd.DataFrame({
+        'ds': future_dates,
+        'yhat': predictions,
+        'yhat_lower': predictions * 0.8,  # Simple confidence intervals
+        'yhat_upper': predictions * 1.2
+    })
+def forecast_bilstm(ts: pd.DataFrame, periods: int = 7, seq_length: int = 7, hyperparams: dict = None) -> pd.DataFrame:
+    """Forecast using Bi-LSTM model"""
+    if not TF_AVAILABLE:
+        raise RuntimeError('TensorFlow not available')
+    # Set default hyperparameters
+    if hyperparams is None:
+        hyperparams = {}
+    seq_length = hyperparams.get('seq_length', seq_length)  # Use hyperparams seq_length if provided
+    epochs = hyperparams.get('epochs', 50)
+    batch_size = hyperparams.get('batch_size', 16)
+    learning_rate = hyperparams.get('learning_rate', 0.001)
+    units = hyperparams.get('units', 50)
+    dropout_rate = hyperparams.get('dropout_rate', 0.2)
+    # Prepare data
+    data = ts['y'].values.reshape(-1, 1)
+    scaler = MinMaxScaler(feature_range=(0, 1))
+    scaled_data = scaler.fit_transform(data)
+    # Create sequences
+    X, y = create_sequences(scaled_data, seq_length)
+    if len(X) < 10:  # Not enough data
+        return forecast_naive(ts, periods)
+    # Split data
+    train_size = int(len(X) * 0.8)
+    X_train, X_test = X[:train_size], X[train_size:]
+    y_train, y_test = y[:train_size], y[train_size:]
+    # Reshape for Bi-LSTM [samples, time steps, features]
+    X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))
+    X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], 1))
+    # Build Bi-LSTM model
+    model = Sequential([
+        Bidirectional(LSTM(units, activation='relu'), input_shape=(seq_length, 1)),
+        Dropout(dropout_rate),
+        Dense(1)
+    ])
+    optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)
+    model.compile(optimizer=optimizer, loss='mse')
+    # Train model
+    model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, verbose=0, validation_data=(X_test, y_test))
+    # Make predictions
+    predictions = []
+    current_sequence = scaled_data[-seq_length:].reshape(1, seq_length, 1)
+    for _ in range(periods):
+        pred = model.predict(current_sequence, verbose=0)
+        predictions.append(pred[0][0])
+        # Update sequence for next prediction
+        current_sequence = np.roll(current_sequence, -1, axis=1)
+        current_sequence[0, -1, 0] = pred[0][0]
+    # Inverse transform predictions
+    predictions = scaler.inverse_transform(np.array(predictions).reshape(-1, 1)).flatten()
+    # Create forecast dataframe
+    last_date = ts['ds'].max()
+    future_dates = pd.date_range(start=last_date + pd.Timedelta(days=1), periods=periods, freq='D')
+    return pd.DataFrame({
+        'ds': future_dates,
+        'yhat': predictions,
+        'yhat_lower': predictions * 0.8,
+        'yhat_upper': predictions * 1.2
+    })
+def forecast_gru(ts: pd.DataFrame, periods: int = 7, seq_length: int = 7, hyperparams: dict = None) -> pd.DataFrame:
+    """Forecast using GRU model"""
+    if not TF_AVAILABLE:
+        raise RuntimeError('TensorFlow not available')
+    # Set default hyperparameters
+    if hyperparams is None:
+        hyperparams = {}
+    seq_length = hyperparams.get('seq_length', seq_length)  # Use hyperparams seq_length if provided
+    epochs = hyperparams.get('epochs', 50)
+    batch_size = hyperparams.get('batch_size', 16)
+    learning_rate = hyperparams.get('learning_rate', 0.001)
+    units = hyperparams.get('units', 50)
+    dropout_rate = hyperparams.get('dropout_rate', 0.2)
+    # Prepare data
+    data = ts['y'].values.reshape(-1, 1)
+    scaler = MinMaxScaler(feature_range=(0, 1))
+    scaled_data = scaler.fit_transform(data)
+    # Create sequences
+    X, y = create_sequences(scaled_data, seq_length)
+    if len(X) < 10:  # Not enough data
+        return forecast_naive(ts, periods)
+    # Split data
+    train_size = int(len(X) * 0.8)
+    X_train, X_test = X[:train_size], X[train_size:]
+    y_train, y_test = y[:train_size], y[train_size:]
+    # Reshape for GRU [samples, time steps, features]
+    X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))
+    X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], 1))
+    # Build GRU model
+    model = Sequential([
+        GRU(units, activation='relu', input_shape=(seq_length, 1)),
+        Dropout(dropout_rate),
+        Dense(1)
+    ])
+    optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)
+    model.compile(optimizer=optimizer, loss='mse')
+    # Train model
+    model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, verbose=0, validation_data=(X_test, y_test))
+    # Make predictions
+    predictions = []
+    current_sequence = scaled_data[-seq_length:].reshape(1, seq_length, 1)
+    for _ in range(periods):
+        pred = model.predict(current_sequence, verbose=0)
+        predictions.append(pred[0][0])
+        # Update sequence for next prediction
+        current_sequence = np.roll(current_sequence, -1, axis=1)
+        current_sequence[0, -1, 0] = pred[0][0]
+    # Inverse transform predictions
+    predictions = scaler.inverse_transform(np.array(predictions).reshape(-1, 1)).flatten()
+    # Create forecast dataframe
+    last_date = ts['ds'].max()
+    future_dates = pd.date_range(start=last_date + pd.Timedelta(days=1), periods=periods, freq='D')
+    return pd.DataFrame({
+        'ds': future_dates,
+        'yhat': predictions,
+        'yhat_lower': predictions * 0.8,
+        'yhat_upper': predictions * 1.2
+    })
+    """Forecast using multivariate LSTM model"""
+    if not TF_AVAILABLE:
+        raise RuntimeError('TensorFlow not available')
+    # Select features for multivariate forecasting
+    feature_cols = [col for col in ts.columns if col not in ['ds', 'OpDeviceType', 'Owner', 'Weather', 'EventType']]
+    if target_col not in feature_cols:
+        raise ValueError(f"Target column '{target_col}' not found in features")
+    # Prepare data
+    data = ts[feature_cols].values
+    scaler = MinMaxScaler(feature_range=(0, 1))
+    scaled_data = scaler.fit_transform(data)
+    # Create sequences
+    X, y = create_sequences(scaled_data, seq_length)
+    if len(X) < 10:  # Not enough data
+        return forecast_naive(ts[['ds', target_col]].rename(columns={target_col: 'y'}), periods)
+    # Split data
+    train_size = int(len(X) * 0.8)
+    X_train, X_test = X[:train_size], X[train_size:]
+    y_train, y_test = y[:train_size], y[train_size:]
+    # Reshape for LSTM [samples, time steps, features]
+    X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], len(feature_cols)))
+    X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], len(feature_cols)))
+    # Build multivariate LSTM model
+    model = Sequential([
+        LSTM(100, activation='relu', return_sequences=True, input_shape=(seq_length, len(feature_cols))),
+        Dropout(0.2),
+        LSTM(50, activation='relu'),
+        Dropout(0.2),
+        Dense(1)
+    ])
+    model.compile(optimizer='adam', loss='mse')
+    # Train model
+    model.fit(X_train, y_train, epochs=100, batch_size=16, verbose=0, validation_data=(X_test, y_test))
+    # Make predictions
+    predictions = []
+    current_sequence = scaled_data[-seq_length:].reshape(1, seq_length, len(feature_cols))
+    for _ in range(periods):
+        pred = model.predict(current_sequence, verbose=0)
+        predictions.append(pred[0][0])
+        # Update sequence for next prediction (use predicted value for target, keep other features)
+        new_row = current_sequence[0, -1, :].copy()
+        new_row[feature_cols.index(target_col)] = pred[0][0]  # Update target with prediction
+        current_sequence = np.roll(current_sequence, -1, axis=1)
+        current_sequence[0, -1, :] = new_row
+    # Inverse transform predictions (only for target column)
+    target_scaler = MinMaxScaler(feature_range=(0, 1))
+    target_scaler.fit(data[:, feature_cols.index(target_col)].reshape(-1, 1))
+    predictions = target_scaler.inverse_transform(np.array(predictions).reshape(-1, 1)).flatten()
+    # Create forecast dataframe
+    last_date = ts['ds'].max()
+    future_dates = pd.date_range(start=last_date + pd.Timedelta(days=1), periods=periods, freq='D')
+    return pd.DataFrame({
+        'ds': future_dates,
+        'yhat': predictions,
+        'yhat_lower': predictions * 0.8,
+        'yhat_upper': predictions * 1.2
+    })
+    """Forecast using LSTM model"""
+    if not TF_AVAILABLE:
+        raise RuntimeError('TensorFlow not available')
+    # Prepare data
+    data = ts['y'].values.reshape(-1, 1)
+    scaler = MinMaxScaler(feature_range=(0, 1))
+    scaled_data = scaler.fit_transform(data)
+    # Create sequences
+    X, y = create_sequences(scaled_data, seq_length)
+    if len(X) < 10:  # Not enough data
+        return forecast_naive(ts, periods)
+    # Split data
+    train_size = int(len(X) * 0.8)
+    X_train, X_test = X[:train_size], X[train_size:]
+    y_train, y_test = y[:train_size], y[train_size:]
+    # Reshape for LSTM [samples, time steps, features]
+    X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))
+    X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], 1))
+    # Build LSTM model
+    model = Sequential([
+        LSTM(50, activation='relu', input_shape=(seq_length, 1)),
+        Dropout(0.2),
+        Dense(1)
+    ])
+    model.compile(optimizer='adam', loss='mse')
+    # Train model
+    model.fit(X_train, y_train, epochs=50, batch_size=16, verbose=0, validation_data=(X_test, y_test))
+    # Make predictions
+    predictions = []
+    current_sequence = scaled_data[-seq_length:].reshape(1, seq_length, 1)
+    for _ in range(periods):
+        pred = model.predict(current_sequence, verbose=0)
+        predictions.append(pred[0][0])
+        # Update sequence for next prediction
+        current_sequence = np.roll(current_sequence, -1, axis=1)
+        current_sequence[0, -1, 0] = pred[0][0]
+    # Inverse transform predictions
+    predictions = scaler.inverse_transform(np.array(predictions).reshape(-1, 1)).flatten()
+    # Create forecast dataframe
+    last_date = ts['ds'].max()
+    future_dates = pd.date_range(start=last_date + pd.Timedelta(days=1), periods=periods, freq='D')
+    return pd.DataFrame({
+        'ds': future_dates,
+        'yhat': predictions,
+        'yhat_lower': predictions * 0.8,  # Simple confidence intervals
+        'yhat_upper': predictions * 1.2
+    })
+def forecast_bilstm_multivariate(ts: pd.DataFrame, periods: int = 7, seq_length: int = 7, target_col: str = 'daily_count', hyperparams: dict = None) -> pd.DataFrame:
+    """Forecast using multivariate Bi-LSTM model"""
+    if not TF_AVAILABLE:
+        raise RuntimeError('TensorFlow not available')
+    # Set default hyperparameters
+    if hyperparams is None:
+        hyperparams = {}
+    epochs = hyperparams.get('epochs', 100)
+    batch_size = hyperparams.get('batch_size', 16)
+    learning_rate = hyperparams.get('learning_rate', 0.001)
+    units = hyperparams.get('units', 100)
+    dropout_rate = hyperparams.get('dropout_rate', 0.2)
+    # Select features for multivariate forecasting
+    feature_cols = [col for col in ts.columns if col not in ['ds', 'OpDeviceType', 'Owner', 'Weather', 'EventType']]
+    if target_col not in feature_cols:
+        raise ValueError(f"Target column '{target_col}' not found in features")
+    # Prepare data
+    data = ts[feature_cols].values
+    scaler = MinMaxScaler(feature_range=(0, 1))
+    scaled_data = scaler.fit_transform(data)
+    # Create sequences
+    X, y = create_sequences(scaled_data, seq_length)
+    if len(X) < 10:  # Not enough data
+        return forecast_naive(ts[['ds', target_col]].rename(columns={target_col: 'y'}), periods)
+    # Split data
+    train_size = int(len(X) * 0.8)
+    X_train, X_test = X[:train_size], X[train_size:]
+    y_train, y_test = y[:train_size], y[train_size:]
+    # Reshape for Bi-LSTM [samples, time steps, features]
+    X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], len(feature_cols)))
+    X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], len(feature_cols)))
+    # Build multivariate Bi-LSTM model
+    model = Sequential([
+        Bidirectional(LSTM(units, activation='relu', return_sequences=True), input_shape=(seq_length, len(feature_cols))),
+        Dropout(dropout_rate),
+        Bidirectional(LSTM(units//2, activation='relu')),
+        Dropout(dropout_rate),
+        Dense(1)
+    ])
+    optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)
+    model.compile(optimizer=optimizer, loss='mse')
+    # Train model
+    model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, verbose=0, validation_data=(X_test, y_test))
+    # Make predictions
+    predictions = []
+    current_sequence = scaled_data[-seq_length:].reshape(1, seq_length, len(feature_cols))
+    for _ in range(periods):
+        pred = model.predict(current_sequence, verbose=0)
+        predictions.append(pred[0][0])
+        # Update sequence for next prediction
+        new_row = current_sequence[0, -1, :].copy()
+        new_row[feature_cols.index(target_col)] = pred[0][0]
+        current_sequence = np.roll(current_sequence, -1, axis=1)
+        current_sequence[0, -1, :] = new_row
+    # Inverse transform predictions
+    target_scaler = MinMaxScaler(feature_range=(0, 1))
+    target_scaler.fit(data[:, feature_cols.index(target_col)].reshape(-1, 1))
+    predictions = target_scaler.inverse_transform(np.array(predictions).reshape(-1, 1)).flatten()
+    # Create forecast dataframe
+    last_date = ts['ds'].max()
+    future_dates = pd.date_range(start=last_date + pd.Timedelta(days=1), periods=periods, freq='D')
+    return pd.DataFrame({
+        'ds': future_dates,
+        'yhat': predictions,
+        'yhat_lower': predictions * 0.8,
+        'yhat_upper': predictions * 1.2
+    })
+    """Forecast using Bi-LSTM model"""
+    if not TF_AVAILABLE:
+        raise RuntimeError('TensorFlow not available')
+    # Prepare data
+    data = ts['y'].values.reshape(-1, 1)
+    scaler = MinMaxScaler(feature_range=(0, 1))
+    scaled_data = scaler.fit_transform(data)
+    # Create sequences
+    X, y = create_sequences(scaled_data, seq_length)
+    if len(X) < 10:  # Not enough data
+        return forecast_naive(ts, periods)
+    # Split data
+    train_size = int(len(X) * 0.8)
+    X_train, X_test = X[:train_size], X[train_size:]
+    y_train, y_test = y[:train_size], y[train_size:]
+    # Reshape for Bi-LSTM [samples, time steps, features]
+    X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))
+    X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], 1))
+    # Build Bi-LSTM model
+    model = Sequential([
+        Bidirectional(LSTM(50, activation='relu'), input_shape=(seq_length, 1)),
+        Dropout(0.2),
+        Dense(1)
+    ])
+    model.compile(optimizer='adam', loss='mse')
+    # Train model
+    model.fit(X_train, y_train, epochs=50, batch_size=16, verbose=0, validation_data=(X_test, y_test))
+    # Make predictions
+    predictions = []
+    current_sequence = scaled_data[-seq_length:].reshape(1, seq_length, 1)
+    for _ in range(periods):
+        pred = model.predict(current_sequence, verbose=0)
+        predictions.append(pred[0][0])
+        # Update sequence for next prediction
+        current_sequence = np.roll(current_sequence, -1, axis=1)
+        current_sequence[0, -1, 0] = pred[0][0]
+    # Inverse transform predictions
+    predictions = scaler.inverse_transform(np.array(predictions).reshape(-1, 1)).flatten()
+    # Create forecast dataframe
+    last_date = ts['ds'].max()
+    future_dates = pd.date_range(start=last_date + pd.Timedelta(days=1), periods=periods, freq='D')
+    return pd.DataFrame({
+        'ds': future_dates,
+        'yhat': predictions,
+        'yhat_lower': predictions * 0.8,
+        'yhat_upper': predictions * 1.2
+    })
+def forecast_multivariate_lstm(ts: pd.DataFrame, target_col: str = 'target_count', periods: int = 7, seq_length: int = 7) -> pd.DataFrame:
+    """Forecast using multivariate LSTM model"""
+    if not TF_AVAILABLE:
+        raise RuntimeError('TensorFlow not available')
+    # Prepare data - exclude date column and target
+    feature_cols = [col for col in ts.columns if col not in ['ds', target_col]]
+    target_data = ts[target_col].values.reshape(-1, 1)
+    # Handle categorical features - simple label encoding for demo
+    ts_encoded = ts.copy()
+    for col in feature_cols:
+        if ts[col].dtype == 'object':
+            # Simple label encoding
+            unique_vals = ts[col].unique()
+            val_to_int = {val: i for i, val in enumerate(unique_vals)}
+            ts_encoded[col] = ts[col].map(val_to_int)
+    feature_data = ts_encoded[feature_cols].values
+    # Scale features and target separately
+    feature_scaler = MinMaxScaler(feature_range=(0, 1))
+    target_scaler = MinMaxScaler(feature_range=(0, 1))
+    scaled_features = feature_scaler.fit_transform(feature_data)
+    scaled_target = target_scaler.fit_transform(target_data)
+    # Combine features and target for sequences
+    combined_data = np.column_stack([scaled_features, scaled_target])
+    # Create sequences
+    X, y = create_sequences(combined_data, seq_length)
+    if len(X) < 10:  # Not enough data
+        # Fallback to univariate naive
+        univariate_ts = ts[['ds', target_col]].rename(columns={target_col: 'y'})
+        return forecast_naive(univariate_ts, periods)
+    # Split data
+    train_size = int(len(X) * 0.8)
+    X_train, X_test = X[:train_size], X[train_size:]
+    y_train, y_test = y[:train_size], y[train_size:]
+    # X shape: [samples, time_steps, features]
+    n_features = combined_data.shape[1]
+    # Build multivariate LSTM model
+    model = Sequential([
+        LSTM(64, activation='relu', input_shape=(seq_length, n_features), return_sequences=True),
+        Dropout(0.2),
+        LSTM(32, activation='relu'),
+        Dropout(0.2),
+        Dense(16, activation='relu'),
+        Dense(1)
+    ])
+    model.compile(optimizer='adam', loss='mse')
+    # Train model
+    model.fit(X_train, y_train, epochs=50, batch_size=16, verbose=0, validation_data=(X_test, y_test))
+    # Make predictions
+    predictions = []
+    current_sequence = combined_data[-seq_length:].reshape(1, seq_length, n_features)
+    for _ in range(periods):
+        pred = model.predict(current_sequence, verbose=0)
+        predictions.append(pred[0][0])
+        # For next prediction, we need to estimate future features
+        # For simplicity, use the last known feature values
+        next_features = current_sequence[0, -1, :-1]  # All features except target
+        next_sequence = np.column_stack([next_features, pred[0][0]])  # Add predicted target
+        # Update sequence
+        current_sequence = np.roll(current_sequence, -1, axis=1)
+        current_sequence[0, -1, :] = next_sequence
+    # Inverse transform predictions
+    predictions = target_scaler.inverse_transform(np.array(predictions).reshape(-1, 1)).flatten()
+    # Create forecast dataframe
+    last_date = ts['ds'].max()
+    future_dates = pd.date_range(start=last_date + pd.Timedelta(days=1), periods=periods, freq='D')
+    return pd.DataFrame({
+        'ds': future_dates,
+        'yhat': predictions,
+        'yhat_lower': predictions * 0.8,
+        'yhat_upper': predictions * 1.2
+    })
+def forecast_multivariate_gru(ts: pd.DataFrame, target_col: str = 'target_count', periods: int = 7, seq_length: int = 7) -> pd.DataFrame:
+    """Forecast using multivariate GRU model"""
+    if not TF_AVAILABLE:
+        raise RuntimeError('TensorFlow not available')
+    # Similar to multivariate LSTM but using GRU layers
+    feature_cols = [col for col in ts.columns if col not in ['ds', target_col]]
+    target_data = ts[target_col].values.reshape(-1, 1)
+    # Handle categorical features
+    ts_encoded = ts.copy()
+    for col in feature_cols:
+        if ts[col].dtype == 'object':
+            unique_vals = ts[col].unique()
+            val_to_int = {val: i for i, val in enumerate(unique_vals)}
+            ts_encoded[col] = ts[col].map(val_to_int)
+    feature_data = ts_encoded[feature_cols].values
+    # Scale data
+    feature_scaler = MinMaxScaler(feature_range=(0, 1))
+    target_scaler = MinMaxScaler(feature_range=(0, 1))
+    scaled_features = feature_scaler.fit_transform(feature_data)
+    scaled_target = target_scaler.fit_transform(target_data)
+    combined_data = np.column_stack([scaled_features, scaled_target])
+    # Create sequences
+    X, y = create_sequences(combined_data, seq_length)
+    if len(X) < 10:
+        univariate_ts = ts[['ds', target_col]].rename(columns={target_col: 'y'})
+        return forecast_naive(univariate_ts, periods)
+    # Split data
+    train_size = int(len(X) * 0.8)
+    X_train, X_test = X[:train_size], X[train_size:]
+    y_train, y_test = y[:train_size], y[train_size:]
+    n_features = combined_data.shape[1]
+    # Build multivariate GRU model
+    model = Sequential([
+        GRU(64, activation='relu', input_shape=(seq_length, n_features), return_sequences=True),
+        Dropout(0.2),
+        GRU(32, activation='relu'),
+        Dropout(0.2),
+        Dense(16, activation='relu'),
+        Dense(1)
+    ])
+    model.compile(optimizer='adam', loss='mse')
+    # Train model
+    model.fit(X_train, y_train, epochs=50, batch_size=16, verbose=0, validation_data=(X_test, y_test))
+    # Make predictions (same logic as LSTM)
+    predictions = []
+    current_sequence = combined_data[-seq_length:].reshape(1, seq_length, n_features)
+    for _ in range(periods):
+        pred = model.predict(current_sequence, verbose=0)
+        predictions.append(pred[0][0])
+        next_features = current_sequence[0, -1, :-1]
+        next_sequence = np.column_stack([next_features, pred[0][0]])
+        current_sequence = np.roll(current_sequence, -1, axis=1)
+        current_sequence[0, -1, :] = next_sequence
+    # Inverse transform predictions
+    predictions = target_scaler.inverse_transform(np.array(predictions).reshape(-1, 1)).flatten()
+    # Create forecast dataframe
+    last_date = ts['ds'].max()
+    future_dates = pd.date_range(start=last_date + pd.Timedelta(days=1), periods=periods, freq='D')
+    return pd.DataFrame({
+        'ds': future_dates,
+        'yhat': predictions,
+        'yhat_lower': predictions * 0.8,
+        'yhat_upper': predictions * 1.2
+    })
+def run_forecast(df: pd.DataFrame, metric: str = 'count', periods: int = 7, model_type: str = 'prophet', multivariate: bool = False, target_col: str = 'daily_count', hyperparams: dict = None):
+    """
+    Run forecasting with specified model type.
+    Args:
+        df: Input dataframe
+        metric: 'count' or 'downtime_minutes' (for univariate)
+        periods: Number of periods to forecast
+        model_type: 'prophet', 'lstm', 'bilstm', 'gru', or 'naive'
+        multivariate: Whether to use multivariate forecasting
+        target_col: Target column for multivariate forecasting ('daily_count' or 'total_downtime_min')
+        hyperparams: Dictionary of hyperparameters for the model
+    """
+    if multivariate:
+        ts = prepare_multivariate_timeseries(df, target_metric=metric)
+        # Map metric to target column
+        if metric == 'count':
+            target_col = 'daily_count'
+        elif metric == 'downtime_minutes':
+            target_col = 'total_downtime_min'
+        else:
+            target_col = 'daily_count'
+        if model_type == 'lstm':
+            if TF_AVAILABLE and len(ts) >= 14:
+                try:
+                    fcst = forecast_lstm_multivariate(ts, periods=periods, target_col=target_col, hyperparams=hyperparams)
+                    return ts, fcst
+                except Exception as e:
+                    warnings.warn(f'Multivariate LSTM failed: {e}, falling back to univariate')
+            # Fallback to univariate
+            univariate_ts = prepare_timeseries(df, metric=metric)
+            fcst = forecast_naive(univariate_ts, periods=periods)
+            return univariate_ts, fcst
+        elif model_type == 'bilstm':
+            if TF_AVAILABLE and len(ts) >= 14:
+                try:
+                    fcst = forecast_bilstm_multivariate(ts, periods=periods, target_col=target_col, hyperparams=hyperparams)
+                    return ts, fcst
+                except Exception as e:
+                    warnings.warn(f'Multivariate Bi-LSTM failed: {e}, falling back to univariate')
+            # Fallback to univariate
+            univariate_ts = prepare_timeseries(df, metric=metric)
+            fcst = forecast_naive(univariate_ts, periods=periods)
+            return univariate_ts, fcst
+        elif model_type == 'gru':
+            if TF_AVAILABLE and len(ts) >= 14:
+                try:
+                    fcst = forecast_gru_multivariate(ts, periods=periods, target_col=target_col, hyperparams=hyperparams)
+                    return ts, fcst
+                except Exception as e:
+                    warnings.warn(f'Multivariate GRU failed: {e}, falling back to univariate')
+            # Fallback to univariate
+            univariate_ts = prepare_timeseries(df, metric=metric)
+            fcst = forecast_naive(univariate_ts, periods=periods)
+            return univariate_ts, fcst
+        else:
+            # For prophet and other models, fall back to univariate
+            if multivariate:
+                warnings.warn(f'Model {model_type} does not support multivariate forecasting. Using univariate {model_type} instead.')
+            univariate_ts = prepare_timeseries(df, metric=metric)
+            if model_type == 'prophet':
+                if PROPHET_AVAILABLE and len(univariate_ts) >= 14:
+                    try:
+                        fcst = forecast_prophet(univariate_ts, periods=periods, hyperparams=hyperparams)
+                        return univariate_ts, fcst
+                    except Exception:
+                        warnings.warn('Prophet failed, falling back to naive')
+                fcst = forecast_naive(univariate_ts, periods=periods)
+            else:
+                fcst = forecast_naive(univariate_ts, periods=periods)
+            return univariate_ts, fcst
+    else:
+        # Use univariate approach (original logic)
+        ts = prepare_timeseries(df, metric=metric)
+        if model_type == 'prophet':
+            if PROPHET_AVAILABLE and len(ts) >= 14:
+                try:
+                    fcst = forecast_prophet(ts, periods=periods, hyperparams=hyperparams)
+                    return ts, fcst
+                except Exception:
+                    warnings.warn('Prophet failed, falling back to naive')
+            fcst = forecast_naive(ts, periods=periods)
+        elif model_type == 'lstm':
+            if TF_AVAILABLE and len(ts) >= 14:
+                try:
+                    fcst = forecast_lstm(ts, periods=periods, hyperparams=hyperparams)
+                    return ts, fcst
+                except Exception as e:
+                    warnings.warn(f'LSTM failed: {e}, falling back to naive')
+            fcst = forecast_naive(ts, periods=periods)
+        elif model_type == 'bilstm':
+            if TF_AVAILABLE and len(ts) >= 14:
+                try:
+                    fcst = forecast_bilstm(ts, periods=periods, hyperparams=hyperparams)
+                    return ts, fcst
+                except Exception as e:
+                    warnings.warn(f'Bi-LSTM failed: {e}, falling back to naive')
+            fcst = forecast_naive(ts, periods=periods)
+        elif model_type == 'gru':
+            if TF_AVAILABLE and len(ts) >= 14:
+                try:
+                    fcst = forecast_gru(ts, periods=periods, hyperparams=hyperparams)
+                    return ts, fcst
+                except Exception as e:
+                    warnings.warn(f'GRU failed: {e}, falling back to naive')
+            fcst = forecast_naive(ts, periods=periods)
+        else:  # naive or unknown model_type
+            fcst = forecast_naive(ts, periods=periods)
+        return ts, fcst
+def forecast_gru_multivariate(ts: pd.DataFrame, periods: int = 7, seq_length: int = 7, target_col: str = 'daily_count', hyperparams: dict = None) -> pd.DataFrame:
+    """Forecast using multivariate GRU model"""
+    if not TF_AVAILABLE:
+        raise RuntimeError('TensorFlow not available')
+    # Set default hyperparameters
+    if hyperparams is None:
+        hyperparams = {}
+    epochs = hyperparams.get('epochs', 100)
+    batch_size = hyperparams.get('batch_size', 16)
+    learning_rate = hyperparams.get('learning_rate', 0.001)
+    units = hyperparams.get('units', 100)
+    dropout_rate = hyperparams.get('dropout_rate', 0.2)
+    # Select features for multivariate forecasting
+    feature_cols = [col for col in ts.columns if col not in ['ds', 'OpDeviceType', 'Owner', 'Weather', 'EventType']]
+    if target_col not in feature_cols:
+        raise ValueError(f"Target column '{target_col}' not found in features")
+    # Prepare data
+    data = ts[feature_cols].values
+    scaler = MinMaxScaler(feature_range=(0, 1))
+    scaled_data = scaler.fit_transform(data)
+    # Create sequences
+    X, y = create_sequences(scaled_data, seq_length)
+    if len(X) < 10:  # Not enough data
+        return forecast_naive(ts[['ds', target_col]].rename(columns={target_col: 'y'}), periods)
+    # Split data
+    train_size = int(len(X) * 0.8)
+    X_train, X_test = X[:train_size], X[train_size:]
+    y_train, y_test = y[:train_size], y[train_size:]
+    # Reshape for GRU [samples, time steps, features]
+    X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], len(feature_cols)))
+    X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], len(feature_cols)))
+    # Build multivariate GRU model
+    model = Sequential([
+        GRU(units, activation='relu', return_sequences=True, input_shape=(seq_length, len(feature_cols))),
+        Dropout(dropout_rate),
+        GRU(units//2, activation='relu'),
+        Dropout(dropout_rate),
+        Dense(1)
+    ])
+    optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)
+    model.compile(optimizer=optimizer, loss='mse')
+    # Train model
+    model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, verbose=0, validation_data=(X_test, y_test))
+    # Make predictions
+    predictions = []
+    current_sequence = scaled_data[-seq_length:].reshape(1, seq_length, len(feature_cols))
+    for _ in range(periods):
+        pred = model.predict(current_sequence, verbose=0)
+        predictions.append(pred[0][0])
+        # Update sequence for next prediction
+        new_row = current_sequence[0, -1, :].copy()
+        new_row[feature_cols.index(target_col)] = pred[0][0]
+        current_sequence = np.roll(current_sequence, -1, axis=1)
+        current_sequence[0, -1, :] = new_row
+    # Inverse transform predictions
+    target_scaler = MinMaxScaler(feature_range=(0, 1))
+    target_scaler.fit(data[:, feature_cols.index(target_col)].reshape(-1, 1))
+    predictions = target_scaler.inverse_transform(np.array(predictions).reshape(-1, 1)).flatten()
+    # Create forecast dataframe
+    last_date = ts['ds'].max()
+    future_dates = pd.date_range(start=last_date + pd.Timedelta(days=1), periods=periods, freq='D')
+    return pd.DataFrame({
+        'ds': future_dates,
+        'yhat': predictions,
+        'yhat_lower': predictions * 0.8,
+        'yhat_upper': predictions * 1.2
+    })
+    """Forecast using GRU model (univariate)"""
+    if not TF_AVAILABLE:
+        raise RuntimeError('TensorFlow not available')
+    # Prepare data
+    data = ts['y'].values.reshape(-1, 1)
+    scaler = MinMaxScaler(feature_range=(0, 1))
+    scaled_data = scaler.fit_transform(data)
+    # Create sequences
+    X, y = create_sequences(scaled_data, seq_length)
+    if len(X) < 10:  # Not enough data
+        return forecast_naive(ts, periods)
+    # Split data
+    train_size = int(len(X) * 0.8)
+    X_train, X_test = X[:train_size], X[train_size:]
+    y_train, y_test = y[:train_size], y[train_size:]
+    # Reshape for GRU [samples, time steps, features]
+    X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))
+    X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], 1))
+    # Build GRU model
+    model = Sequential([
+        GRU(50, activation='relu', input_shape=(seq_length, 1)),
+        Dropout(0.2),
+        Dense(1)
+    ])
+    model.compile(optimizer='adam', loss='mse')
+    # Train model
+    model.fit(X_train, y_train, epochs=50, batch_size=16, verbose=0, validation_data=(X_test, y_test))
+    # Make predictions
+    predictions = []
+    current_sequence = scaled_data[-seq_length:].reshape(1, seq_length, 1)
+    for _ in range(periods):
+        pred = model.predict(current_sequence, verbose=0)
+        predictions.append(pred[0][0])
+        # Update sequence for next prediction
+        current_sequence = np.roll(current_sequence, -1, axis=1)
+        current_sequence[0, -1, 0] = pred[0][0]
+    # Inverse transform predictions
+    predictions = scaler.inverse_transform(np.array(predictions).reshape(-1, 1)).flatten()
+    # Create forecast dataframe
+    last_date = ts['ds'].max()
+    future_dates = pd.date_range(start=last_date + pd.Timedelta(days=1), periods=periods, freq='D')
+    return pd.DataFrame({
+        'ds': future_dates,
+        'yhat': predictions,
+        'yhat_lower': predictions * 0.8,
+        'yhat_upper': predictions * 1.2
+    })
+def forecast_lstm_multivariate(ts: pd.DataFrame, periods: int = 7, seq_length: int = 7, target_col: str = 'daily_count', hyperparams: dict = None) -> pd.DataFrame:
+    """Forecast using multivariate LSTM model"""
+    if not TF_AVAILABLE:
+        raise RuntimeError('TensorFlow not available')
+    # Set default hyperparameters
+    if hyperparams is None:
+        hyperparams = {}
+    seq_length = hyperparams.get('seq_length', seq_length)  # Use hyperparams seq_length if provided
+    epochs = hyperparams.get('epochs', 100)
+    batch_size = hyperparams.get('batch_size', 16)
+    learning_rate = hyperparams.get('learning_rate', 0.001)
+    units = hyperparams.get('units', 100)
+    dropout_rate = hyperparams.get('dropout_rate', 0.2)
+    # Select features for multivariate forecasting
+    feature_cols = [col for col in ts.columns if col not in ['ds', 'OpDeviceType', 'Owner', 'Weather', 'EventType']]
+    if target_col not in feature_cols:
+        raise ValueError(f"Target column '{target_col}' not found in features")
+    # Prepare data
+    data = ts[feature_cols].values
+    scaler = MinMaxScaler(feature_range=(0, 1))
+    scaled_data = scaler.fit_transform(data)
+    # Create sequences
+    X, y = create_sequences(scaled_data, seq_length)
+    if len(X) < 10:  # Not enough data
+        return forecast_naive(ts[['ds', target_col]].rename(columns={target_col: 'y'}), periods)
+    # Split data
+    train_size = int(len(X) * 0.8)
+    X_train, X_test = X[:train_size], X[train_size:]
+    y_train, y_test = y[:train_size], y[train_size:]
+    # Reshape for LSTM [samples, time steps, features]
+    X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], len(feature_cols)))
+    X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], len(feature_cols)))
+    # Build multivariate LSTM model
+    model = Sequential([
+        LSTM(units, activation='relu', return_sequences=True, input_shape=(seq_length, len(feature_cols))),
+        Dropout(dropout_rate),
+        LSTM(units//2, activation='relu'),
+        Dropout(dropout_rate),
+        Dense(1)
+    ])
+    optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)
+    model.compile(optimizer=optimizer, loss='mse')
+    # Train model
+    model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, verbose=0, validation_data=(X_test, y_test))
+    # Make predictions
+    predictions = []
+    current_sequence = scaled_data[-seq_length:].reshape(1, seq_length, len(feature_cols))
+    for _ in range(periods):
+        pred = model.predict(current_sequence, verbose=0)
+        predictions.append(pred[0][0])
+        # Update sequence for next prediction (use predicted value for target, keep other features)
+        new_row = current_sequence[0, -1, :].copy()
+        new_row[feature_cols.index(target_col)] = pred[0][0]  # Update target with prediction
+        current_sequence = np.roll(current_sequence, -1, axis=1)
+        current_sequence[0, -1, :] = new_row
+    # Inverse transform predictions (only for target column)
+    target_scaler = MinMaxScaler(feature_range=(0, 1))
+    target_scaler.fit(data[:, feature_cols.index(target_col)].reshape(-1, 1))
+    predictions = target_scaler.inverse_transform(np.array(predictions).reshape(-1, 1)).flatten()
+    # Create forecast dataframe
+    last_date = ts['ds'].max()
+    future_dates = pd.date_range(start=last_date + pd.Timedelta(days=1), periods=periods, freq='D')
+    return pd.DataFrame({
+        'ds': future_dates,
+        'yhat': predictions,
+        'yhat_lower': predictions * 0.8,
+        'yhat_upper': predictions * 1.2
+    })
+def forecast_bilstm_multivariate(ts: pd.DataFrame, periods: int = 7, seq_length: int = 7, target_col: str = 'daily_count', hyperparams: dict = None) -> pd.DataFrame:
+    """Forecast using multivariate Bi-LSTM model"""
+    if not TF_AVAILABLE:
+        raise RuntimeError('TensorFlow not available')
+    # Set default hyperparameters
+    if hyperparams is None:
+        hyperparams = {}
+    epochs = hyperparams.get('epochs', 100)
+    batch_size = hyperparams.get('batch_size', 16)
+    learning_rate = hyperparams.get('learning_rate', 0.001)
+    units = hyperparams.get('units', 100)
+    dropout_rate = hyperparams.get('dropout_rate', 0.2)
+    # Select features for multivariate forecasting
+    feature_cols = [col for col in ts.columns if col not in ['ds', 'OpDeviceType', 'Owner', 'Weather', 'EventType']]
+    if target_col not in feature_cols:
+        raise ValueError(f"Target column '{target_col}' not found in features")
+    # Prepare data
+    data = ts[feature_cols].values
+    scaler = MinMaxScaler(feature_range=(0, 1))
+    scaled_data = scaler.fit_transform(data)
+    # Create sequences
+    X, y = create_sequences(scaled_data, seq_length)
+    if len(X) < 10:  # Not enough data
+        return forecast_naive(ts[['ds', target_col]].rename(columns={target_col: 'y'}), periods)
+    # Split data
+    train_size = int(len(X) * 0.8)
+    X_train, X_test = X[:train_size], X[train_size:]
+    y_train, y_test = y[:train_size], y[train_size:]
+    # Reshape for Bi-LSTM [samples, time steps, features]
+    X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], len(feature_cols)))
+    X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], len(feature_cols)))
+    # Build multivariate Bi-LSTM model
+    model = Sequential([
+        Bidirectional(LSTM(units, activation='relu', return_sequences=True), input_shape=(seq_length, len(feature_cols))),
+        Dropout(dropout_rate),
+        Bidirectional(LSTM(units//2, activation='relu')),
+        Dropout(dropout_rate),
+        Dense(1)
+    ])
+    optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)
+    model.compile(optimizer=optimizer, loss='mse')
+    # Train model
+    model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, verbose=0, validation_data=(X_test, y_test))
+    # Make predictions
+    predictions = []
+    current_sequence = scaled_data[-seq_length:].reshape(1, seq_length, len(feature_cols))
+    for _ in range(periods):
+        pred = model.predict(current_sequence, verbose=0)
+        predictions.append(pred[0][0])
+        # Update sequence for next prediction
+        new_row = current_sequence[0, -1, :].copy()
+        new_row[feature_cols.index(target_col)] = pred[0][0]
+        current_sequence = np.roll(current_sequence, -1, axis=1)
+        current_sequence[0, -1, :] = new_row
+    # Inverse transform predictions
+    target_scaler = MinMaxScaler(feature_range=(0, 1))
+    target_scaler.fit(data[:, feature_cols.index(target_col)].reshape(-1, 1))
+    predictions = target_scaler.inverse_transform(np.array(predictions).reshape(-1, 1)).flatten()
+    # Create forecast dataframe
+    last_date = ts['ds'].max()
+    future_dates = pd.date_range(start=last_date + pd.Timedelta(days=1), periods=periods, freq='D')
+    return pd.DataFrame({
+        'ds': future_dates,
+        'yhat': predictions,
+        'yhat_lower': predictions * 0.8,
+        'yhat_upper': predictions * 1.2
+    })
+def forecast_gru_multivariate(ts: pd.DataFrame, periods: int = 7, seq_length: int = 7, target_col: str = 'daily_count', hyperparams: dict = None) -> pd.DataFrame:
+    """Forecast using multivariate GRU model"""
+    if not TF_AVAILABLE:
+        raise RuntimeError('TensorFlow not available')
+    # Set default hyperparameters
+    if hyperparams is None:
+        hyperparams = {}
+    epochs = hyperparams.get('epochs', 100)
+    batch_size = hyperparams.get('batch_size', 16)
+    learning_rate = hyperparams.get('learning_rate', 0.001)
+    units = hyperparams.get('units', 100)
+    dropout_rate = hyperparams.get('dropout_rate', 0.2)
+    # Select features for multivariate forecasting
+    feature_cols = [col for col in ts.columns if col not in ['ds', 'OpDeviceType', 'Owner', 'Weather', 'EventType']]
+    if target_col not in feature_cols:
+        raise ValueError(f"Target column '{target_col}' not found in features")
+    # Prepare data
+    data = ts[feature_cols].values
+    scaler = MinMaxScaler(feature_range=(0, 1))
+    scaled_data = scaler.fit_transform(data)
+    # Create sequences
+    X, y = create_sequences(scaled_data, seq_length)
+    if len(X) < 10:  # Not enough data
+        return forecast_naive(ts[['ds', target_col]].rename(columns={target_col: 'y'}), periods)
+    # Split data
+    train_size = int(len(X) * 0.8)
+    X_train, X_test = X[:train_size], X[train_size:]
+    y_train, y_test = y[:train_size], y[train_size:]
+    # Reshape for GRU [samples, time steps, features]
+    X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], len(feature_cols)))
+    X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], len(feature_cols)))
+    # Build multivariate GRU model
+    model = Sequential([
+        GRU(units, activation='relu', return_sequences=True, input_shape=(seq_length, len(feature_cols))),
+        Dropout(dropout_rate),
+        GRU(units//2, activation='relu'),
+        Dropout(dropout_rate),
+        Dense(1)
+    ])
+    optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate)
+    model.compile(optimizer=optimizer, loss='mse')
+    # Train model
+    model.fit(X_train, y_train, epochs=epochs, batch_size=batch_size, verbose=0, validation_data=(X_test, y_test))
+    # Make predictions
+    predictions = []
+    current_sequence = scaled_data[-seq_length:].reshape(1, seq_length, len(feature_cols))
+    for _ in range(periods):
+        pred = model.predict(current_sequence, verbose=0)
+        predictions.append(pred[0][0])
+        # Update sequence for next prediction
+        new_row = current_sequence[0, -1, :].copy()
+        new_row[feature_cols.index(target_col)] = pred[0][0]
+        current_sequence = np.roll(current_sequence, -1, axis=1)
+        current_sequence[0, -1, :] = new_row
+    # Inverse transform predictions
+    target_scaler = MinMaxScaler(feature_range=(0, 1))
+    target_scaler.fit(data[:, feature_cols.index(target_col)].reshape(-1, 1))
+    predictions = target_scaler.inverse_transform(np.array(predictions).reshape(-1, 1)).flatten()
+    # Create forecast dataframe
+    last_date = ts['ds'].max()
+    future_dates = pd.date_range(start=last_date + pd.Timedelta(days=1), periods=periods, freq='D')
+    return pd.DataFrame({
+        'ds': future_dates,
+        'yhat': predictions,
+        'yhat_lower': predictions * 0.8,
+        'yhat_upper': predictions * 1.2
+    })

scripts/{summarize.py → recommendation.py} RENAMED Viewed

File without changes

scripts/summary.py ADDED Viewed

	@@ -0,0 +1,159 @@

+import os
+import pandas as pd
+from typing import Dict
+from pathlib import Path
+# Prefer HF router via OpenAI-compatible client. Use env `HF_TOKEN`.
+# HF_TOKEN loaded lazily to allow dotenv loading after import
+def get_hf_token():
+    return os.environ.get('HF_TOKEN')
+def openai_summary(text: str, verbosity: str = 'brief', model: str = 'meta-llama/Llama-3.1-8B-Instruct:novita') -> str:
+    HF_TOKEN = get_hf_token()
+    if not HF_TOKEN:
+        return None
+    try:
+        # Import here to avoid requiring OpenAI client unless HF_TOKEN set
+        from openai import OpenAI
+        client = OpenAI(base_url="https://router.huggingface.co/v1", api_key=HF_TOKEN)
+        if verbosity == 'analyze':
+            instruction = 'วิเคราะห์สาเหตุไฟฟ้าจากข้อมูลนี้ สรุปไม่เกิน 3-4 บรรทัด (ไทย) ระบุสาเหตุทางเทคนิค ผลกระทบต่อลูกค้าและระบบ และช่วงเวลา:'
+        elif verbosity == 'recommend':
+            instruction = 'วิเคราะห์สาเหตุไฟฟ้าจากข้อมูลนี้ พร้อมแนะนำการแก้ไข สรุปไม่เกิน 3-4 บรรทัด (ไทย) ระบุสาเหตุทางเทคนิค ผลกระทบต่อลูกค้าและระบบ ช่วงเวลาและข้อเสนอแนะในการป้องกัน:'
+        prompt = f"{instruction}\n\n{text}\n\nสรุป:"
+        completion = client.chat.completions.create(
+            model=model,
+            messages=[{"role": "user", "content": prompt}],
+            max_tokens=1000,
+        )
+        # Extract text from response
+        choice = completion.choices[0]
+        msg = choice.message
+        content = msg.content
+        return content.strip() if content else None
+    except Exception:
+        return None
+def summarize_overall(df: pd.DataFrame, use_hf: bool = False, model: str = 'meta-llama/Llama-3.1-8B-Instruct:novita', total_customers: float = None) -> Dict:
+    """Summarize overall outage data with GenAI and reliability metrics."""
+    # Basic statistics
+    total_events = len(df)
+    date_cols = ['OutageDateTime', 'FirstRestoDateTime', 'LastRestoDateTime', 'CreateEventDateTime', 'CloseEventDateTime']
+    # Parse dates
+    df_copy = df.copy()
+    for col in date_cols:
+        if col in df_copy.columns:
+            df_copy[col] = pd.to_datetime(df_copy[col], dayfirst=True, errors='coerce')
+    # Calculate basic metrics
+    if 'OutageDateTime' in df_copy.columns:
+        date_range = f"{df_copy['OutageDateTime'].min()} ถึง {df_copy['OutageDateTime'].max()}" if pd.notna(df_copy['OutageDateTime'].min()) else "ไม่ระบุ"
+    else:
+        date_range = "ไม่ระบุ"
+    # Event types
+    event_types = df_copy.get('EventType', pd.Series()).value_counts().head(5).to_dict()
+    # Affected customers
+    total_affected = 0
+    if 'AffectedCustomer' in df_copy.columns:
+        total_affected = pd.to_numeric(df_copy['AffectedCustomer'], errors='coerce').sum()
+    # Create summary text for GenAI
+    summary_text = f"""
+ข้อมูลไฟฟ้าล้มทั้งหมด:
+- จำนวนเหตุการณ์ทั้งหมด: {total_events}
+- ช่วงเวลาที่เกิดเหตุการณ์: {date_range}
+- ประเภทเหตุการณ์หลัก: {', '.join([f'{k}: {v}' for k, v in event_types.items()])}
+- จำนวนลูกค้าที่ได้รับผลกระทบทั้งหมด: {int(total_affected) if not pd.isna(total_affected) else 'ไม่ระบุ'}
+"""
+    # Reliability metrics DataFrame
+    reliability_df = pd.DataFrame()
+    reliability_summary = ""
+    if total_customers and total_customers > 0:
+        try:
+            from scripts.compute_reliability import compute_reliability
+            import tempfile
+            import os
+            # Save df to temp CSV for compute_reliability
+            with tempfile.NamedTemporaryFile(mode='w', suffix='.csv', delete=False) as f:
+                df_copy.to_csv(f.name, index=False)
+                temp_path = f.name
+            try:
+                reliability_results = compute_reliability(temp_path, total_customers=total_customers, exclude_planned=True)
+                overall_metrics = reliability_results.get('overall', pd.DataFrame())
+                if not overall_metrics.empty:
+                    row = overall_metrics.iloc[0]
+                    # Create reliability DataFrame with proper metric names
+                    reliability_data = [
+                        {
+                            'Metric': 'SAIFI',
+                            'Full Name': 'System Average Interruption Frequency Index',
+                            'Value': f"{row.get('SAIFI', 'N/A'):.4f}",
+                            'Unit': 'ครั้ง/ลูกค้า',
+                            'Description': 'ความถี่เฉลี่ยของการขัดข้องต่อลูกค้า'
+                        },
+                        {
+                            'Metric': 'SAIDI',
+                            'Full Name': 'System Average Interruption Duration Index',
+                            'Value': f"{row.get('SAIDI', 'N/A'):.2f}",
+                            'Unit': 'นาที/ลูกค้า',
+                            'Description': 'ระยะเวลาขัดข้องเฉลี่ยต่อลูกค้า'
+                        },
+                        {
+                            'Metric': 'CAIDI',
+                            'Full Name': 'Customer Average Interruption Duration Index',
+                            'Value': f"{row.get('CAIDI', 'N/A'):.2f}",
+                            'Unit': 'นาที/ครั้ง',
+                            'Description': 'ระยะเวลาขัดข้องเฉลี่ยต่อครั้ง'
+                        },
+                        {
+                            'Metric': 'MAIFI',
+                            'Full Name': 'Momentary Average Interruption Frequency Index',
+                            'Value': f"{row.get('MAIFI', 'N/A'):.4f}",
+                            'Unit': 'ครั้ง/ลูกค้า',
+                            'Description': 'ความถี่เฉลี่ยของการขัดข้องชั่วคราวต่อลูกค้า'
+                        }
+                    ]
+                    reliability_df = pd.DataFrame(reliability_data)
+                    reliability_summary = f"""
+ดัชนีความน่าเชื่อถือ:
+- SAIFI (System Average Interruption Frequency Index): {row.get('SAIFI', 'N/A'):.4f} ครั้ง/ลูกค้า
+- SAIDI (System Average Interruption Duration Index): {row.get('SAIDI', 'N/A'):.2f} นาที/ลูกค้า
+- CAIDI (Customer Average Interruption Duration Index): {row.get('CAIDI', 'N/A'):.2f} นาที/ครั้ง
+- MAIFI (Momentary Average Interruption Frequency Index): {row.get('MAIFI', 'N/A'):.4f} ครั้ง/ลูกค้า
+"""
+                    summary_text += reliability_summary
+            finally:
+                os.unlink(temp_path)
+        except Exception as e:
+            reliability_summary = f"ไม่สามารถคำนวณดัชนีความน่าเชื่อถือได้: {str(e)}"
+    # Use GenAI for overall summary
+    ai_summary = None
+    if use_hf and get_hf_token():
+        try:
+            instruction = "สรุปภาพรวมข้อมูลไฟฟ้าล้มจากข้อมูลนี้ สรุปเป็นย่อหน้าเดียว (ไทย) ระบุจำนวนเหตุการณ์ สาเหตุหลัก ผลกระทบ และข้อเสนอแนะในการปรับปรุงระบบไฟฟ้า:"
+            prompt = f"{instruction}\n\n{summary_text}\n\nสรุปภาพรวม:"
+            ai_summary = openai_summary(prompt, verbosity='recommend', model=model)
+        except Exception as e:
+            ai_summary = f"ไม่สามารถสร้างสรุปด้วย AI ได้: {str(e)}"
+    return {
+        'total_events': total_events,
+        'date_range': date_range,
+        'event_types': event_types,
+        'total_affected_customers': int(total_affected) if not pd.isna(total_affected) else None,
+        'basic_summary': summary_text.strip(),
+        'reliability_summary': reliability_summary.strip() if reliability_summary else None,
+        'reliability_df': reliability_df,
+        'ai_summary': ai_summary,
+    }