db_query / documentations /clustering_doc.py
DavMelchi's picture
feat: improve streamlit docs and clean navigation icons
4d0d437
import streamlit as st
st.markdown(
"""
# Automatic Site Clustering Documentation
## 1. Objective
Cluster sites from geographic coordinates with a configurable max sites per cluster.
## 2. When to use this tool
Use this page to build operational clusters for planning, field operations, or optimization workloads.
## 3. Input files and accepted formats
- Required: one Excel file in `.xlsx`
- Sample: `samples/Site_Clustering.xlsx`
## 4. Required columns/fields
You must select:
- latitude column
- longitude column
- region column
- site code column
## 5. Step-by-step usage
1. Open `Apps > Automatic Site Clustering`.
2. Upload `.xlsx` dataset.
3. Select columns and set `Max sites per cluster`.
4. Choose clustering method:
- uniform cluster size (Hilbert curve)
- lower-than-max non-uniform clusters (KMeans)
5. Optionally enable region mixing.
6. Click `Run Clustering` and download output.
## 6. Outputs generated
- clustered dataset with a `Cluster` column
- cluster size charts
- map visualization by cluster
- downloadable file: `clustered_sites.xlsx`
## 7. Frequent errors and fixes
- Invalid map or missing points.
- Fix: verify numeric latitude/longitude values.
- Unexpected cluster composition.
- Fix: tune `Max sites per cluster` and method choice.
- Empty output.
- Fix: ensure uploaded file is not empty and selected columns are correct.
## 8. Minimal reproducible example
- Input: `samples/Site_Clustering.xlsx`
- Action: run with default `max_sites=25`, no region mixing.
- Expected result: cluster assignment, charts, map, and downloadable Excel.
## 9. Known limitations
- KMeans outcome can vary with data distribution.
- Hilbert strategy is coordinate-normalization based.
- Extreme outliers can reduce cluster interpretability.
## 10. Version and update date
- Documentation version: 1.0
- Last update: 2026-02-23
"""
)