auto-analyst-backend / docs /routes /deep_analysis.md
GitHub Actions
Merge branch 'FireBird-Technologies:main' into main
b05d409

Deep Analysis API Documentation

Overview

The Deep Analysis system provides advanced multi-agent analytical capabilities that automatically generate comprehensive reports based on user goals. The system uses DSPy (Declarative Self-improving Language Programs) to orchestrate multiple AI agents and create detailed analytical insights.

Key Features

  • Multi-Agent Analysis: Orchestrates multiple specialized agents (preprocessing, statistical analysis, machine learning, visualization)
  • Template Integration: Uses the user's active templates/agents for analysis
  • Streaming Progress: Real-time progress updates during analysis execution
  • Report Persistence: Stores complete analysis reports in database with metadata
  • HTML Export: Generates downloadable HTML reports with visualizations
  • Credit Tracking: Monitors token usage, costs, and credits consumed

Template Integration

The deep analysis system integrates with the user's active templates through the agent system:

  1. Agent Selection: Uses agents from the user's active template preferences (configured via /templates endpoints)
  2. Default Agents: Falls back to system default agents if user hasn't configured preferences:
    • preprocessing (both individual and planner variants)
    • statistical_analytics (both individual and planner variants)
    • sk_learn (both individual and planner variants)
    • data_viz (both individual and planner variants)
  3. Template Limits: Respects the 10-template limit for planner performance optimization
  4. Dynamic Planning: The planner automatically selects the most appropriate agents based on the analysis goal and available templates

Analysis Flow

The deep analysis process follows these steps:

  1. Question Generation (20% progress): Generates 5 targeted analytical questions based on the user's goal
  2. Planning (40% progress): Creates an optimized execution plan using available agents
  3. Agent Execution (60% progress): Executes analysis using user's active templates
  4. Code Synthesis (80% progress): Combines and optimizes code from all agents
  5. Code Execution (85% progress): Runs the synthesized analysis code
  6. Synthesis (90% progress): Synthesizes results into coherent insights
  7. Conclusion (100% progress): Generates final conclusions and recommendations

API Endpoints

Create Deep Analysis Report

POST /deep_analysis/reports

Creates a new deep analysis report in the database.

Request Body:

{
  "report_uuid": "string",
  "user_id": 123,
  "goal": "Analyze customer churn patterns",
  "status": "completed",
  "deep_questions": "1. What factors...\n2. How does...",
  "deep_plan": "{\n  \"@preprocessing\": {\n    \"create\": [...],\n    \"use\": [...],\n    \"instruction\": \"...\"\n  }\n}",
  "summaries": ["Agent summary 1", "Agent summary 2"],
  "analysis_code": "import pandas as pd\n# Analysis code...",
  "plotly_figures": [{"data": [...], "layout": {...}}],
  "synthesis": ["Synthesis result 1"],
  "final_conclusion": "## Conclusion\nThe analysis reveals...",
  "html_report": "<html>...</html>",
  "report_summary": "Brief summary of findings",
  "progress_percentage": 100,
  "duration_seconds": 120,
  "credits_consumed": 5,
  "error_message": null,
  "model_provider": "anthropic",
  "model_name": "claude-sonnet-4-20250514",
  "total_tokens_used": 15000,
  "estimated_cost": 0.25,
  "steps_completed": ["questions", "planning", "execution", "synthesis", "conclusion"]
}

Response:

{
  "report_id": 1,
  "report_uuid": "uuid-string",
  "user_id": 123,
  "goal": "Analyze customer churn patterns",
  "status": "completed",
  "start_time": "2024-01-01T12:00:00Z",
  "end_time": "2024-01-01T12:02:00Z",
  "duration_seconds": 120,
  "report_summary": "Brief summary of findings",
  "created_at": "2024-01-01T12:02:00Z",
  "updated_at": "2024-01-01T12:02:00Z"
}

Get Deep Analysis Reports

GET /deep_analysis/reports

Retrieves a list of deep analysis reports with optional filtering.

Query Parameters:

  • user_id (optional): Filter by user ID
  • limit (optional): Number of reports to return (1-100, default: 10)
  • offset (optional): Number of reports to skip (default: 0)
  • status (optional): Filter by status ("pending", "running", "completed", "failed")

Response:

[
  {
    "report_id": 1,
    "report_uuid": "uuid-string",
    "user_id": 123,
    "goal": "Analyze customer churn patterns",
    "status": "completed",
    "start_time": "2024-01-01T12:00:00Z",
    "end_time": "2024-01-01T12:02:00Z",
    "duration_seconds": 120,
    "report_summary": "Brief summary of findings",
    "created_at": "2024-01-01T12:02:00Z",
    "updated_at": "2024-01-01T12:02:00Z"
  }
]

Get User Historical Reports

GET /deep_analysis/reports/user_historical

Retrieves all historical deep analysis reports for a specific user.

Query Parameters:

  • user_id: User ID (required)
  • limit (optional): Number of reports to return (1-100, default: 50)

Get Report by ID

GET /deep_analysis/reports/{report_id}

Retrieves a complete deep analysis report by ID.

Query Parameters:

  • user_id (optional): Ensures report belongs to specified user

Response:

{
  "report_id": 1,
  "report_uuid": "uuid-string",
  "user_id": 123,
  "goal": "Analyze customer churn patterns",
  "status": "completed",
  "start_time": "2024-01-01T12:00:00Z",
  "end_time": "2024-01-01T12:02:00Z",
  "duration_seconds": 120,
  "deep_questions": "1. What factors contribute to churn?\n2. How does churn vary by segment?",
  "deep_plan": "{\n  \"@preprocessing\": {...},\n  \"@statistical_analytics\": {...}\n}",
  "summaries": ["Agent performed data cleaning...", "Statistical analysis revealed..."],
  "analysis_code": "import pandas as pd\n# Complete analysis code",
  "plotly_figures": [{"data": [...], "layout": {...}}],
  "synthesis": ["The analysis shows clear patterns..."],
  "final_conclusion": "## Conclusion\nCustomer churn is primarily driven by...",
  "html_report": "<html>...</html>",
  "report_summary": "Analysis of customer churn patterns reveals...",
  "progress_percentage": 100,
  "credits_consumed": 5,
  "error_message": null,
  "model_provider": "anthropic",
  "model_name": "claude-sonnet-4-20250514",
  "total_tokens_used": 15000,
  "estimated_cost": 0.25,
  "steps_completed": ["questions", "planning", "execution", "synthesis", "conclusion"],
  "created_at": "2024-01-01T12:02:00Z",
  "updated_at": "2024-01-01T12:02:00Z"
}

Get Report by UUID

GET /deep_analysis/reports/uuid/{report_uuid}

Retrieves a complete deep analysis report by UUID. Same response format as get by ID.

Delete Report

DELETE /deep_analysis/reports/{report_id}

Deletes a deep analysis report.

Query Parameters:

  • user_id (optional): Ensures report belongs to specified user

Response:

{
  "message": "Report 1 deleted successfully"
}

Update Report Status

PUT /deep_analysis/reports/{report_id}/status

Updates the status of a deep analysis report.

Request Body:

{
  "status": "completed"
}

Valid Status Values:

  • pending: Analysis queued but not started
  • running: Analysis in progress
  • completed: Analysis finished successfully
  • failed: Analysis encountered errors

Get HTML Report

GET /deep_analysis/reports/uuid/{report_uuid}/html

Retrieves only the HTML report content for a specific analysis.

Query Parameters:

  • user_id (optional): Ensures report belongs to specified user

Response:

{
  "html_report": "<html>...</html>",
  "filename": "deep_analysis_report_20240101_120200.html"
}

Download HTML Report

POST /deep_analysis/download_from_db/{report_uuid}

Downloads the HTML report as a file attachment.

Query Parameters:

  • user_id (optional): Ensures report belongs to specified user

Response:

  • Content-Type: text/html; charset=utf-8
  • Content-Disposition: attachment; filename="deep_analysis_report_TIMESTAMP.html"

Deep Analysis Module Architecture

DSPy Signatures

The system uses several DSPy signatures for different analysis phases:

1. deep_questions

Generates 5 targeted analytical questions based on the user's goal and dataset structure.

2. deep_planner

Creates an optimized execution plan using the user's active templates/agents. The planner:

  • Verifies feasibility using available datasets and agent descriptions
  • Batches similar questions per agent call for efficiency
  • Reuses outputs across questions to minimize agent calls
  • Defines clear variable flow and dependencies between agents

3. deep_code_synthesizer

Combines and optimizes code from multiple agents:

  • Fixes errors and inconsistencies between agent outputs
  • Ensures proper data flow and type handling
  • Converts all visualizations to Plotly format
  • Adds comprehensive error handling and validation

4. deep_synthesizer

Synthesizes analysis results into coherent insights and findings.

5. final_conclusion

Generates final conclusions and strategic recommendations based on all analysis results.

Streaming Analysis

The execute_deep_analysis_streaming method provides real-time progress updates:

async for update in deep_analysis.execute_deep_analysis_streaming(goal, dataset_info, session_df):
    if update["step"] == "questions":
        # Handle questions generation progress
    elif update["step"] == "planning":
        # Handle planning progress
    elif update["step"] == "agent_execution":
        # Handle agent execution progress
    # ... handle other steps

Integration with User Templates

The deep analysis system integrates with user templates in several ways:

  1. Agent Discovery: Retrieves user's active template preferences from the database
  2. Dynamic Planning: The planner uses available agents to create optimal execution plans
  3. Template Validation: Ensures all referenced agents exist in the user's active templates
  4. Fallback Handling: Uses default agents if user preferences are incomplete
  5. Performance Optimization: Respects template limits for efficient execution

Error Handling

The system includes comprehensive error handling:

  • Code Execution Errors: Automatically attempts to fix and retry failed code
  • Template Missing: Falls back to default agents if user templates are unavailable
  • Timeout Protection: Includes timeouts for long-running operations
  • Memory Management: Handles large datasets and visualization efficiently
  • Unicode Handling: Cleans problematic characters that might cause encoding issues

Visualization Integration

All visualizations are standardized to Plotly format:

  • Consistent styling and color schemes
  • Interactive features (zoom, pan, hover)
  • Accessibility compliance (colorblind-friendly palettes)
  • Export capabilities for reports
  • Responsive design for different screen sizes

Frontend Integration

The deep analysis system includes React components for:

  • DeepAnalysisSidebar: Main interface for starting and managing analyses
  • NewAnalysisForm: Form for initiating new deep analyses
  • CurrentAnalysisView: Real-time progress tracking during analysis
  • HistoryView: Browse and access historical analysis reports
  • AnalysisStep: Individual step progress visualization

The frontend integrates with the streaming API to provide real-time feedback and uses the user's active template configuration for personalized analysis capabilities.

Credit and Cost Tracking

The system tracks detailed usage metrics:

  • Credits Consumed: Number of credits deducted from user account
  • Token Usage: Total tokens used across all model calls
  • Estimated Cost: Dollar cost estimate based on model pricing
  • Model Information: Provider and model name used for analysis
  • Execution Time: Duration of analysis for performance monitoring

This information helps users understand resource consumption and optimize their analysis strategies.