Spaces:

FireBird-Tech
/

auto-analyst-backend

Running on CPU Upgrade

App Files Files Community

auto-analyst-backend / docs /api /routes /deep_analysis.md

GitHub Actions

Merge branch 'FireBird-Technologies:main' into main

7643a03 11 days ago

preview code

raw

history blame contribute delete

12.1 kB

	# Deep Analysis API Documentation

	## Overview

	The Deep Analysis system provides advanced multi-agent analytical capabilities that automatically generate comprehensive reports based on user goals. The system uses DSPy (Declarative Self-improving Language Programs) to orchestrate multiple AI agents and create detailed analytical insights.

	## Key Features

	- Multi-Agent Analysis: Orchestrates multiple specialized agents (preprocessing, statistical analysis, machine learning, visualization)
	- Template Integration: Uses the user's active templates/agents for analysis
	- Streaming Progress: Real-time progress updates during analysis execution
	- Report Persistence: Stores complete analysis reports in database with metadata
	- HTML Export: Generates downloadable HTML reports with visualizations
	- Credit Tracking: Monitors token usage, costs, and credits consumed

	## Template Integration

	The deep analysis system integrates with the user's active templates through the agent system:

	1. Agent Selection: Uses agents from the user's active template preferences (configured via `/templates` endpoints)
	2. Default Agents: Falls back to system default agents if user hasn't configured preferences:
	- `preprocessing` (both individual and planner variants)
	- `statistical_analytics` (both individual and planner variants)
	- `sk_learn` (both individual and planner variants)
	- `data_viz` (both individual and planner variants)
	3. Template Limits: Respects the 10-template limit for planner performance optimization
	4. Dynamic Planning: The planner automatically selects the most appropriate agents based on the analysis goal and available templates

	## Analysis Flow

	The deep analysis process follows these steps:

	1. Question Generation (20% progress): Generates 5 targeted analytical questions based on the user's goal
	2. Planning (40% progress): Creates an optimized execution plan using available agents
	3. Agent Execution (60% progress): Executes analysis using user's active templates
	4. Code Synthesis (80% progress): Combines and optimizes code from all agents
	5. Code Execution (85% progress): Runs the synthesized analysis code
	6. Synthesis (90% progress): Synthesizes results into coherent insights
	7. Conclusion (100% progress): Generates final conclusions and recommendations

	---

	## API Endpoints

	### Create Deep Analysis Report

	POST `/deep_analysis/reports`

	Creates a new deep analysis report in the database.

	Request Body:
	```json
	{
	"report_uuid": "string",
	"user_id": 123,
	"goal": "Analyze customer churn patterns",
	"status": "completed",
	"deep_questions": "1. What factors...\n2. How does...",
	"deep_plan": "{\n \"@preprocessing\": {\n \"create\": [...],\n \"use\": [...],\n \"instruction\": \"...\"\n }\n}",
	"summaries": ["Agent summary 1", "Agent summary 2"],
	"analysis_code": "import pandas as pd\n# Analysis code...",
	"plotly_figures": [{"data": [...], "layout": {...}}],
	"synthesis": ["Synthesis result 1"],
	"final_conclusion": "## Conclusion\nThe analysis reveals...",
	"html_report": "<html>...</html>",
	"report_summary": "Brief summary of findings",
	"progress_percentage": 100,
	"duration_seconds": 120,
	"credits_consumed": 5,
	"error_message": null,
	"model_provider": "anthropic",
	"model_name": "claude-sonnet-4-20250514",
	"total_tokens_used": 15000,
	"estimated_cost": 0.25,
	"steps_completed": ["questions", "planning", "execution", "synthesis", "conclusion"]
	}
	```

	Response:
	```json
	{
	"report_id": 1,
	"report_uuid": "uuid-string",
	"user_id": 123,
	"goal": "Analyze customer churn patterns",
	"status": "completed",
	"start_time": "2024-01-01T12:00:00Z",
	"end_time": "2024-01-01T12:02:00Z",
	"duration_seconds": 120,
	"report_summary": "Brief summary of findings",
	"created_at": "2024-01-01T12:02:00Z",
	"updated_at": "2024-01-01T12:02:00Z"
	}
	```

	### Get Deep Analysis Reports

	GET `/deep_analysis/reports`

	Retrieves a list of deep analysis reports with optional filtering.

	Query Parameters:
	- `user_id` (optional): Filter by user ID
	- `limit` (optional): Number of reports to return (1-100, default: 10)
	- `offset` (optional): Number of reports to skip (default: 0)
	- `status` (optional): Filter by status ("pending", "running", "completed", "failed")

	Response:
	```json
	[
	{
	"report_id": 1,
	"report_uuid": "uuid-string",
	"user_id": 123,
	"goal": "Analyze customer churn patterns",
	"status": "completed",
	"start_time": "2024-01-01T12:00:00Z",
	"end_time": "2024-01-01T12:02:00Z",
	"duration_seconds": 120,
	"report_summary": "Brief summary of findings",
	"created_at": "2024-01-01T12:02:00Z",
	"updated_at": "2024-01-01T12:02:00Z"
	}
	]
	```

	### Get User Historical Reports

	GET `/deep_analysis/reports/user_historical`

	Retrieves all historical deep analysis reports for a specific user.

	Query Parameters:
	- `user_id`: User ID (required)
	- `limit` (optional): Number of reports to return (1-100, default: 50)

	### Get Report by ID

	GET `/deep_analysis/reports/{report_id}`

	Retrieves a complete deep analysis report by ID.

	Query Parameters:
	- `user_id` (optional): Ensures report belongs to specified user

	Response:
	```json
	{
	"report_id": 1,
	"report_uuid": "uuid-string",
	"user_id": 123,
	"goal": "Analyze customer churn patterns",
	"status": "completed",
	"start_time": "2024-01-01T12:00:00Z",
	"end_time": "2024-01-01T12:02:00Z",
	"duration_seconds": 120,
	"deep_questions": "1. What factors contribute to churn?\n2. How does churn vary by segment?",
	"deep_plan": "{\n \"@preprocessing\": {...},\n \"@statistical_analytics\": {...}\n}",
	"summaries": ["Agent performed data cleaning...", "Statistical analysis revealed..."],
	"analysis_code": "import pandas as pd\n# Complete analysis code",
	"plotly_figures": [{"data": [...], "layout": {...}}],
	"synthesis": ["The analysis shows clear patterns..."],
	"final_conclusion": "## Conclusion\nCustomer churn is primarily driven by...",
	"html_report": "<html>...</html>",
	"report_summary": "Analysis of customer churn patterns reveals...",
	"progress_percentage": 100,
	"credits_consumed": 5,
	"error_message": null,
	"model_provider": "anthropic",
	"model_name": "claude-sonnet-4-20250514",
	"total_tokens_used": 15000,
	"estimated_cost": 0.25,
	"steps_completed": ["questions", "planning", "execution", "synthesis", "conclusion"],
	"created_at": "2024-01-01T12:02:00Z",
	"updated_at": "2024-01-01T12:02:00Z"
	}
	```

	### Get Report by UUID

	GET `/deep_analysis/reports/uuid/{report_uuid}`

	Retrieves a complete deep analysis report by UUID. Same response format as get by ID.

	### Delete Report

	DELETE `/deep_analysis/reports/{report_id}`

	Deletes a deep analysis report.

	Query Parameters:
	- `user_id` (optional): Ensures report belongs to specified user

	Response:
	```json
	{
	"message": "Report 1 deleted successfully"
	}
	```

	### Update Report Status

	PUT `/deep_analysis/reports/{report_id}/status`

	Updates the status of a deep analysis report.

	Request Body:
	```json
	{
	"status": "completed"
	}
	```

	Valid Status Values:
	- `pending`: Analysis queued but not started
	- `running`: Analysis in progress
	- `completed`: Analysis finished successfully
	- `failed`: Analysis encountered errors

	### Get HTML Report

	GET `/deep_analysis/reports/uuid/{report_uuid}/html`

	Retrieves only the HTML report content for a specific analysis.

	Query Parameters:
	- `user_id` (optional): Ensures report belongs to specified user

	Response:
	```json
	{
	"html_report": "<html>...</html>",
	"filename": "deep_analysis_report_20240101_120200.html"
	}
	```

	### Download HTML Report

	POST `/deep_analysis/download_from_db/{report_uuid}`

	Downloads the HTML report as a file attachment.

	Query Parameters:
	- `user_id` (optional): Ensures report belongs to specified user

	Response:
	- Content-Type: `text/html; charset=utf-8`
	- Content-Disposition: `attachment; filename="deep_analysis_report_TIMESTAMP.html"`

	---

	## Deep Analysis Module Architecture

	### DSPy Signatures

	The system uses several DSPy signatures for different analysis phases:

	#### 1. `deep_questions`
	Generates 5 targeted analytical questions based on the user's goal and dataset structure.

	#### 2. `deep_planner`
	Creates an optimized execution plan using the user's active templates/agents. The planner:
	- Verifies feasibility using available datasets and agent descriptions
	- Batches similar questions per agent call for efficiency
	- Reuses outputs across questions to minimize agent calls
	- Defines clear variable flow and dependencies between agents

	#### 3. `deep_code_synthesizer`
	Combines and optimizes code from multiple agents:
	- Fixes errors and inconsistencies between agent outputs
	- Ensures proper data flow and type handling
	- Converts all visualizations to Plotly format
	- Adds comprehensive error handling and validation

	#### 4. `deep_synthesizer`
	Synthesizes analysis results into coherent insights and findings.

	#### 5. `final_conclusion`
	Generates final conclusions and strategic recommendations based on all analysis results.

	### Streaming Analysis

	The `execute_deep_analysis_streaming` method provides real-time progress updates:

	```python
	async for update in deep_analysis.execute_deep_analysis_streaming(goal, dataset_info, session_df):
	if update["step"] == "questions":
	# Handle questions generation progress
	elif update["step"] == "planning":
	# Handle planning progress
	elif update["step"] == "agent_execution":
	# Handle agent execution progress
	# ... handle other steps
	```

	### Integration with User Templates

	The deep analysis system integrates with user templates in several ways:

	1. Agent Discovery: Retrieves user's active template preferences from the database
	2. Dynamic Planning: The planner uses available agents to create optimal execution plans
	3. Template Validation: Ensures all referenced agents exist in the user's active templates
	4. Fallback Handling: Uses default agents if user preferences are incomplete
	5. Performance Optimization: Respects template limits for efficient execution

	### Error Handling

	The system includes comprehensive error handling:

	- Code Execution Errors: Automatically attempts to fix and retry failed code
	- Template Missing: Falls back to default agents if user templates are unavailable
	- Timeout Protection: Includes timeouts for long-running operations
	- Memory Management: Handles large datasets and visualization efficiently
	- Unicode Handling: Cleans problematic characters that might cause encoding issues

	### Visualization Integration

	All visualizations are standardized to Plotly format:
	- Consistent styling and color schemes
	- Interactive features (zoom, pan, hover)
	- Accessibility compliance (colorblind-friendly palettes)
	- Export capabilities for reports
	- Responsive design for different screen sizes

	---

	## Frontend Integration

	The deep analysis system includes React components for:

	- DeepAnalysisSidebar: Main interface for starting and managing analyses
	- NewAnalysisForm: Form for initiating new deep analyses
	- CurrentAnalysisView: Real-time progress tracking during analysis
	- HistoryView: Browse and access historical analysis reports
	- AnalysisStep: Individual step progress visualization

	The frontend integrates with the streaming API to provide real-time feedback and uses the user's active template configuration for personalized analysis capabilities.

	## Credit and Cost Tracking

	The system tracks detailed usage metrics:
	- Credits Consumed: Number of credits deducted from user account
	- Token Usage: Total tokens used across all model calls
	- Estimated Cost: Dollar cost estimate based on model pricing
	- Model Information: Provider and model name used for analysis
	- Execution Time: Duration of analysis for performance monitoring

	This information helps users understand resource consumption and optimize their analysis strategies.