GitHub Actions
Merge branch 'FireBird-Technologies:main' into main
7643a03
# Deep Analysis API Documentation
## Overview
The Deep Analysis system provides advanced multi-agent analytical capabilities that automatically generate comprehensive reports based on user goals. The system uses DSPy (Declarative Self-improving Language Programs) to orchestrate multiple AI agents and create detailed analytical insights.
## Key Features
- **Multi-Agent Analysis**: Orchestrates multiple specialized agents (preprocessing, statistical analysis, machine learning, visualization)
- **Template Integration**: Uses the user's active templates/agents for analysis
- **Streaming Progress**: Real-time progress updates during analysis execution
- **Report Persistence**: Stores complete analysis reports in database with metadata
- **HTML Export**: Generates downloadable HTML reports with visualizations
- **Credit Tracking**: Monitors token usage, costs, and credits consumed
## Template Integration
The deep analysis system integrates with the user's active templates through the agent system:
1. **Agent Selection**: Uses agents from the user's active template preferences (configured via `/templates` endpoints)
2. **Default Agents**: Falls back to system default agents if user hasn't configured preferences:
- `preprocessing` (both individual and planner variants)
- `statistical_analytics` (both individual and planner variants)
- `sk_learn` (both individual and planner variants)
- `data_viz` (both individual and planner variants)
3. **Template Limits**: Respects the 10-template limit for planner performance optimization
4. **Dynamic Planning**: The planner automatically selects the most appropriate agents based on the analysis goal and available templates
## Analysis Flow
The deep analysis process follows these steps:
1. **Question Generation** (20% progress): Generates 5 targeted analytical questions based on the user's goal
2. **Planning** (40% progress): Creates an optimized execution plan using available agents
3. **Agent Execution** (60% progress): Executes analysis using user's active templates
4. **Code Synthesis** (80% progress): Combines and optimizes code from all agents
5. **Code Execution** (85% progress): Runs the synthesized analysis code
6. **Synthesis** (90% progress): Synthesizes results into coherent insights
7. **Conclusion** (100% progress): Generates final conclusions and recommendations
---
## API Endpoints
### Create Deep Analysis Report
**POST** `/deep_analysis/reports`
Creates a new deep analysis report in the database.
**Request Body:**
```json
{
"report_uuid": "string",
"user_id": 123,
"goal": "Analyze customer churn patterns",
"status": "completed",
"deep_questions": "1. What factors...\n2. How does...",
"deep_plan": "{\n \"@preprocessing\": {\n \"create\": [...],\n \"use\": [...],\n \"instruction\": \"...\"\n }\n}",
"summaries": ["Agent summary 1", "Agent summary 2"],
"analysis_code": "import pandas as pd\n# Analysis code...",
"plotly_figures": [{"data": [...], "layout": {...}}],
"synthesis": ["Synthesis result 1"],
"final_conclusion": "## Conclusion\nThe analysis reveals...",
"html_report": "<html>...</html>",
"report_summary": "Brief summary of findings",
"progress_percentage": 100,
"duration_seconds": 120,
"credits_consumed": 5,
"error_message": null,
"model_provider": "anthropic",
"model_name": "claude-sonnet-4-20250514",
"total_tokens_used": 15000,
"estimated_cost": 0.25,
"steps_completed": ["questions", "planning", "execution", "synthesis", "conclusion"]
}
```
**Response:**
```json
{
"report_id": 1,
"report_uuid": "uuid-string",
"user_id": 123,
"goal": "Analyze customer churn patterns",
"status": "completed",
"start_time": "2024-01-01T12:00:00Z",
"end_time": "2024-01-01T12:02:00Z",
"duration_seconds": 120,
"report_summary": "Brief summary of findings",
"created_at": "2024-01-01T12:02:00Z",
"updated_at": "2024-01-01T12:02:00Z"
}
```
### Get Deep Analysis Reports
**GET** `/deep_analysis/reports`
Retrieves a list of deep analysis reports with optional filtering.
**Query Parameters:**
- `user_id` (optional): Filter by user ID
- `limit` (optional): Number of reports to return (1-100, default: 10)
- `offset` (optional): Number of reports to skip (default: 0)
- `status` (optional): Filter by status ("pending", "running", "completed", "failed")
**Response:**
```json
[
{
"report_id": 1,
"report_uuid": "uuid-string",
"user_id": 123,
"goal": "Analyze customer churn patterns",
"status": "completed",
"start_time": "2024-01-01T12:00:00Z",
"end_time": "2024-01-01T12:02:00Z",
"duration_seconds": 120,
"report_summary": "Brief summary of findings",
"created_at": "2024-01-01T12:02:00Z",
"updated_at": "2024-01-01T12:02:00Z"
}
]
```
### Get User Historical Reports
**GET** `/deep_analysis/reports/user_historical`
Retrieves all historical deep analysis reports for a specific user.
**Query Parameters:**
- `user_id`: User ID (required)
- `limit` (optional): Number of reports to return (1-100, default: 50)
### Get Report by ID
**GET** `/deep_analysis/reports/{report_id}`
Retrieves a complete deep analysis report by ID.
**Query Parameters:**
- `user_id` (optional): Ensures report belongs to specified user
**Response:**
```json
{
"report_id": 1,
"report_uuid": "uuid-string",
"user_id": 123,
"goal": "Analyze customer churn patterns",
"status": "completed",
"start_time": "2024-01-01T12:00:00Z",
"end_time": "2024-01-01T12:02:00Z",
"duration_seconds": 120,
"deep_questions": "1. What factors contribute to churn?\n2. How does churn vary by segment?",
"deep_plan": "{\n \"@preprocessing\": {...},\n \"@statistical_analytics\": {...}\n}",
"summaries": ["Agent performed data cleaning...", "Statistical analysis revealed..."],
"analysis_code": "import pandas as pd\n# Complete analysis code",
"plotly_figures": [{"data": [...], "layout": {...}}],
"synthesis": ["The analysis shows clear patterns..."],
"final_conclusion": "## Conclusion\nCustomer churn is primarily driven by...",
"html_report": "<html>...</html>",
"report_summary": "Analysis of customer churn patterns reveals...",
"progress_percentage": 100,
"credits_consumed": 5,
"error_message": null,
"model_provider": "anthropic",
"model_name": "claude-sonnet-4-20250514",
"total_tokens_used": 15000,
"estimated_cost": 0.25,
"steps_completed": ["questions", "planning", "execution", "synthesis", "conclusion"],
"created_at": "2024-01-01T12:02:00Z",
"updated_at": "2024-01-01T12:02:00Z"
}
```
### Get Report by UUID
**GET** `/deep_analysis/reports/uuid/{report_uuid}`
Retrieves a complete deep analysis report by UUID. Same response format as get by ID.
### Delete Report
**DELETE** `/deep_analysis/reports/{report_id}`
Deletes a deep analysis report.
**Query Parameters:**
- `user_id` (optional): Ensures report belongs to specified user
**Response:**
```json
{
"message": "Report 1 deleted successfully"
}
```
### Update Report Status
**PUT** `/deep_analysis/reports/{report_id}/status`
Updates the status of a deep analysis report.
**Request Body:**
```json
{
"status": "completed"
}
```
**Valid Status Values:**
- `pending`: Analysis queued but not started
- `running`: Analysis in progress
- `completed`: Analysis finished successfully
- `failed`: Analysis encountered errors
### Get HTML Report
**GET** `/deep_analysis/reports/uuid/{report_uuid}/html`
Retrieves only the HTML report content for a specific analysis.
**Query Parameters:**
- `user_id` (optional): Ensures report belongs to specified user
**Response:**
```json
{
"html_report": "<html>...</html>",
"filename": "deep_analysis_report_20240101_120200.html"
}
```
### Download HTML Report
**POST** `/deep_analysis/download_from_db/{report_uuid}`
Downloads the HTML report as a file attachment.
**Query Parameters:**
- `user_id` (optional): Ensures report belongs to specified user
**Response:**
- Content-Type: `text/html; charset=utf-8`
- Content-Disposition: `attachment; filename="deep_analysis_report_TIMESTAMP.html"`
---
## Deep Analysis Module Architecture
### DSPy Signatures
The system uses several DSPy signatures for different analysis phases:
#### 1. `deep_questions`
Generates 5 targeted analytical questions based on the user's goal and dataset structure.
#### 2. `deep_planner`
Creates an optimized execution plan using the user's active templates/agents. The planner:
- Verifies feasibility using available datasets and agent descriptions
- Batches similar questions per agent call for efficiency
- Reuses outputs across questions to minimize agent calls
- Defines clear variable flow and dependencies between agents
#### 3. `deep_code_synthesizer`
Combines and optimizes code from multiple agents:
- Fixes errors and inconsistencies between agent outputs
- Ensures proper data flow and type handling
- Converts all visualizations to Plotly format
- Adds comprehensive error handling and validation
#### 4. `deep_synthesizer`
Synthesizes analysis results into coherent insights and findings.
#### 5. `final_conclusion`
Generates final conclusions and strategic recommendations based on all analysis results.
### Streaming Analysis
The `execute_deep_analysis_streaming` method provides real-time progress updates:
```python
async for update in deep_analysis.execute_deep_analysis_streaming(goal, dataset_info, session_df):
if update["step"] == "questions":
# Handle questions generation progress
elif update["step"] == "planning":
# Handle planning progress
elif update["step"] == "agent_execution":
# Handle agent execution progress
# ... handle other steps
```
### Integration with User Templates
The deep analysis system integrates with user templates in several ways:
1. **Agent Discovery**: Retrieves user's active template preferences from the database
2. **Dynamic Planning**: The planner uses available agents to create optimal execution plans
3. **Template Validation**: Ensures all referenced agents exist in the user's active templates
4. **Fallback Handling**: Uses default agents if user preferences are incomplete
5. **Performance Optimization**: Respects template limits for efficient execution
### Error Handling
The system includes comprehensive error handling:
- **Code Execution Errors**: Automatically attempts to fix and retry failed code
- **Template Missing**: Falls back to default agents if user templates are unavailable
- **Timeout Protection**: Includes timeouts for long-running operations
- **Memory Management**: Handles large datasets and visualization efficiently
- **Unicode Handling**: Cleans problematic characters that might cause encoding issues
### Visualization Integration
All visualizations are standardized to Plotly format:
- Consistent styling and color schemes
- Interactive features (zoom, pan, hover)
- Accessibility compliance (colorblind-friendly palettes)
- Export capabilities for reports
- Responsive design for different screen sizes
---
## Frontend Integration
The deep analysis system includes React components for:
- **DeepAnalysisSidebar**: Main interface for starting and managing analyses
- **NewAnalysisForm**: Form for initiating new deep analyses
- **CurrentAnalysisView**: Real-time progress tracking during analysis
- **HistoryView**: Browse and access historical analysis reports
- **AnalysisStep**: Individual step progress visualization
The frontend integrates with the streaming API to provide real-time feedback and uses the user's active template configuration for personalized analysis capabilities.
## Credit and Cost Tracking
The system tracks detailed usage metrics:
- **Credits Consumed**: Number of credits deducted from user account
- **Token Usage**: Total tokens used across all model calls
- **Estimated Cost**: Dollar cost estimate based on model pricing
- **Model Information**: Provider and model name used for analysis
- **Execution Time**: Duration of analysis for performance monitoring
This information helps users understand resource consumption and optimize their analysis strategies.