db_query / documentations /anomaly_detection_doc.py
DavMelchi's picture
Improve documentation
3d9465e
import streamlit as st
st.markdown(
"""
# KPI Anomaly Detection Documentation
## Overview
The KPI Anomaly Detection application is designed to automatically identify and analyze anomalies in Key Performance Indicators (KPIs) using change point detection algorithms. It helps in identifying significant changes in KPI trends that may indicate network issues or other important events.
## Features
### 1. Data Processing
- Supports both CSV and Excel file formats
- Automatic date parsing and data cleaning
- Handles missing values appropriately
- Processes multiple KPIs in a single run
### 2. Anomaly Detection
- Utilizes the PELT (Pruned Exact Linear Time) algorithm for change point detection
- Configurable penalty parameter to control sensitivity
- Identifies both sudden and gradual changes in KPI trends
- Filters out insignificant changes based on mean value differences
### 3. Visualization
- Interactive time series plots with Plotly
- Visual indicators for detected change points
- Displays initial and final mean values for comparison
- Responsive design for different screen sizes
### 4. Reporting
- Export detected anomalies to Excel
- Separate sheets for each KPI
- Includes all relevant data points and change point indicators
## Input Requirements
### Required File Format
- **CSV or Excel** file containing time series KPI data
- First 5 columns should be (in order):
1. Date/Time
2. Controller ID
3. BTS ID
4. Cell ID
5. DN (Directory Number)
- Remaining columns should contain KPI values
### Data Requirements
- At least 30 data points per cell for reliable detection
- Consistent time intervals between measurements
- Numeric values for KPI columns
## Usage
### 1. Upload Data
- Click "Upload KPI file" and select your CSV or Excel file
- The application will automatically detect the file format
### 2. Configure Detection
- Adjust the "Penalty" parameter to control sensitivity:
- Lower values = More sensitive (more change points detected)
- Higher values = Less sensitive (only major changes detected)
- Default value of 2.5 works well for most cases
### 3. Review Results
- The application will display a list of KPIs with detected anomalies
- Select a KPI and cell to view detailed analysis
- The plot shows:
- KPI values over time (blue line)
- Detected change points (red markers)
- Initial mean (gray dotted line)
- Final mean (black dashed line)
### 4. Export Results
- Click "Generate Excel file with anomalies" to export all detected anomalies
- Each KPI is saved in a separate sheet
- The Excel file includes all data points with change point indicators
## Technical Details
### Algorithm
- Uses the PELT (Pruned Exact Linear Time) algorithm from the `ruptures` library
- Model: RBF (Radial Basis Function) kernel for detecting changes in mean
- Automatic pruning of similar change points
### Performance Considerations
- Processing time depends on:
- Number of cells in the dataset
- Number of KPIs
- Length of the time series
- Large datasets may take several minutes to process
- Results are cached for better performance when adjusting parameters
## Troubleshooting
### Common Issues
1. **No anomalies detected**
- Try reducing the penalty value
- Check if the data contains enough variation
- Ensure there are at least 30 data points per cell
2. **Too many false positives**
- Increase the penalty value
- Check data for noise or outliers
- Consider pre-processing the data
3. **File format errors**
- Ensure the file is not open in another program
- Check that the file is not corrupted
- Verify the column structure matches requirements
## Best Practices
1. Start with the default penalty value (2.5) and adjust as needed
2. For large datasets, consider processing in smaller chunks
3. Review detected anomalies in context with network events or changes
4. Regularly update the application to get the latest improvements
## Dependencies
- Python 3.7+
- pandas
- numpy
- plotly
- ruptures
- streamlit
"""
)