Spaces:
Running
on
CPU Upgrade
Auto-Analyst Backend System Architecture
Overview
Auto-Analyst is a sophisticated multi-agent AI platform designed for comprehensive data analysis. The backend system orchestrates specialized AI agents, manages user sessions, and provides a robust API for data processing and analysis workflows.
ποΈ High-Level Architecture
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Frontend β β Backend β β Database β
β (Next.js) βββββΊβ (FastAPI) βββββΊβ (PostgreSQL/ β
β β β β β SQLite) β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β
βΌ
ββββββββββββββββββββ
β AI Models β
β (DSPy/LLMs) β
ββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββ
β Agent System β
β [Processing] β
β [Analytics] β
β [Visualization] β
ββββββββββββββββββββ
π― Core Components
1. Application Layer (app.py
)
FastAPI Application Server
- Role: Main HTTP server and request router
- Responsibilities:
- Request/response handling
- Session-based authentication
- Route registration and middleware
- Error handling and logging
- Static file serving
- CORS configuration
Key Features:
- Async/await support for high concurrency
- Automatic API documentation generation
- Request validation with Pydantic
- Session management for user tracking
2. Agent System (src/agents/
)
Multi-Agent Orchestra
- Core Agents: Specialized AI agents for different analysis tasks
- Deep Analysis: Advanced multi-agent coordination system
- Template System: User-customizable agent configurations
Agent Types
Individual Agents (
agents.py
):- preprocessing_agent # Data cleaning and preparation - statistical_analytics_agent # Statistical analysis - sk_learn_agent # Machine learning with scikit-learn - data_viz_agent # Data visualization - basic_qa_agent # General Q&A
Planner Agents (Multi-agent coordination):
- planner_preprocessing_agent - planner_statistical_analytics_agent - planner_sk_learn_agent - planner_data_viz_agent
Deep Analysis System (
deep_agents.py
):- deep_questions # Question generation - deep_planner # Execution planning - deep_code_synthesizer # Code combination - deep_synthesizer # Result synthesis - final_conclusion # Report generation
Agent Architecture Pattern
class AgentSignature(dspy.Signature):
"""Agent description and purpose"""
goal = dspy.InputField(desc="Analysis objective")
dataset = dspy.InputField(desc="Dataset information")
plan_instructions = dspy.InputField(desc="Execution plan")
summary = dspy.OutputField(desc="Analysis summary")
code = dspy.OutputField(desc="Generated code")
3. Database Layer (src/db/
)
Data Persistence and Management
Database Models (schemas/models.py
):
# Core Models
User # User accounts and authentication
Chat # Conversation sessions
Message # Individual messages in chats
ModelUsage # AI model usage tracking
# Template System
AgentTemplate # Agent definitions and configurations
UserTemplatePreference # User's enabled/disabled agents
# Deep Analysis
DeepAnalysisReport # Analysis reports and results
# Analytics
CodeExecution # Code execution tracking
UserAnalytics # User behavior analytics
Database Architecture:
Users (1) ββββββββ (Many) Chats
β β
β βΌ
ββββ (Many) ModelUsage βββ
β
ββββ (Many) UserTemplatePreference
β
βΌ
AgentTemplate
4. Route Handlers (src/routes/
)
RESTful API Endpoints
Module | Purpose | Key Endpoints |
---|---|---|
core_routes.py |
Core functionality | /upload_excel , /session_info , /health |
chat_routes.py |
Chat management | /chats , /messages , /delete_chat |
code_routes.py |
Code operations | /execute_code , /get_latest_code |
templates_routes.py |
Agent templates | /templates , /user/{id}/enabled |
deep_analysis_routes.py |
Deep analysis | /reports , /download_from_db |
analytics_routes.py |
System analytics | /usage , /feedback , /costs |
feedback_routes.py |
User feedback | /feedback , /message/{id}/feedback |
5. Business Logic Layer (src/managers/
)
Service Layer for Complex Operations
Manager Components:
chat_manager.py
:- Session management - Message handling - Context preservation - Agent orchestration
ai_manager.py
:- Model selection and routing - Token tracking and cost calculation - Error handling and retries - Response formatting
session_manager.py
:- Session lifecycle management - Data sharing between agents - Memory management - Cleanup operations
6. Utility Layer (src/utils/
)
Shared Services and Helpers
logger.py
: Centralized logging systemgenerate_report.py
: HTML report generationmodel_registry.py
: AI model configuration
π Data Flow Architecture
1. Request Processing Flow
HTTP Request β FastAPI Router β Route Handler β Manager/Business Logic β
Database/Agent System β AI Model β Response Processing β JSON Response
2. Agent Execution Flow
User Query β Session Creation β Template Selection β Agent Loading β
Code Generation β Code Execution β Result Processing β Response Formatting
3. Deep Analysis Flow
Analysis Goal β Question Generation β Planning Phase β Agent Coordination β
Code Synthesis β Execution β Result Synthesis β Final Report Generation
4. Template System Flow
User Preferences β Template Loading β Agent Registration β
Capability Mapping β Execution Routing β Usage Tracking
π¨ Design Patterns
1. Module Pattern
- Clear separation of concerns
- Each module has specific responsibilities
- Minimal dependencies between modules
2. Repository Pattern
- Database access abstracted through SQLAlchemy
- Session management centralized
- Clean separation of data and business logic
3. Strategy Pattern
- Multiple AI models supported through unified interface
- Agent selection based on user preferences
- Dynamic template loading
4. Observer Pattern
- Usage tracking and analytics
- Event-driven model updates
- Real-time progress notifications
5. Factory Pattern
- Agent creation based on template configurations
- Session factory for database connections
- Dynamic model instantiation
π§ Configuration Management
Environment Configuration
# Database
DATABASE_URL: str # Database connection string
POSTGRES_PASSWORD: str # PostgreSQL password (optional)
# AI Models
ANTHROPIC_API_KEY: str # Claude API key
OPENAI_API_KEY: str # OpenAI API key
# Authentication
ADMIN_API_KEY: str # Admin operations key (optional)
# Deployment
PORT: int = 8000 # Server port
DEBUG: bool = False # Debug mode
Agent Configuration (agents_config.json
)
{
"default_agents": [
{
"template_name": "preprocessing_agent",
"description": "Data cleaning and preparation",
"variant_type": "both",
"is_premium": false,
"usage_count": 0,
"icon_url": "preprocessing.svg"
}
],
"premium_templates": [...],
"remove": [...]
}
π Security Architecture
Authentication & Authorization
Session-based Authentication:
- Session IDs for user identification
- Optional API key authentication for admin endpoints
Input Validation:
- Pydantic models for request validation
- SQL injection prevention through SQLAlchemy
- File upload restrictions and validation
Resource Protection:
- User-specific data isolation
- Usage tracking and monitoring
- Rate limiting considerations
Data Security
Database Security:
- Encrypted connections for PostgreSQL
- Parameterized queries
- Regular backup procedures
Code Execution Security:
- Sandboxed code execution environment
- Limited library imports
- Timeout protection
π Performance Architecture
Scalability Features
Async Architecture:
- Non-blocking I/O operations
- Concurrent agent execution
- Streaming responses for long operations
Database Optimization:
- Connection pooling
- Query optimization
- Indexed frequently accessed columns
Caching Strategy:
- In-memory caching for templates
- Result caching for expensive operations
- Session data management
Performance Monitoring
Usage Analytics:
- Request/response time tracking
- Token usage monitoring
- Error rate analysis
Resource Monitoring:
- Database query performance
- Memory usage tracking
- Agent execution time analysis
π Deployment Architecture
Development Environment
Local Development β SQLite Database β File-based Logging β
Direct Model API Calls β Hot Reloading
Production Environment
Load Balancer β Multiple FastAPI Instances β PostgreSQL Database β
Centralized Logging β Monitoring & Alerting
Container Architecture
# Multi-stage build for optimization
FROM python:3.11-slim as base
# Dependencies and application setup
# Health checks and graceful shutdown
# Environment-specific configurations
π Integration Patterns
External Service Integration
AI Model Providers:
- Anthropic (Claude)
- OpenAI (GPT models)
- Unified interface through DSPy
Database Systems:
- PostgreSQL (production)
- SQLite (development)
- Migration support through Alembic
Frontend Integration
REST API:
- Standard HTTP endpoints
- JSON request/response format
- Session-based communication
Data Exchange:
- File upload capabilities
- Real-time analysis results
- Report generation and download
Third-Party Integration
Python Data Science Stack:
- Pandas for data manipulation
- NumPy for numerical computing
- Scikit-learn for machine learning
- Plotly for visualization
- Statsmodels for statistical analysis
Development Tools:
- Alembic for database migrations
- SQLAlchemy for ORM
- FastAPI for web framework
- Pydantic for data validation
π Documentation Architecture
API Documentation
- Auto-generated Docs: Available at
/docs
endpoint - Schema Definitions: Pydantic models with descriptions
- Endpoint Documentation: Detailed parameter and response docs
Code Documentation
- Inline Documentation: Comprehensive docstrings
- Architecture Guides: High-level system design documentation
- Getting Started: Developer onboarding documentation
- Troubleshooting: Common issues and solutions
This architecture provides a robust, scalable foundation for multi-agent AI analysis while maintaining clean separation of concerns and supporting both development and production deployment scenarios.