Auto-Analyst Backend - Getting Started Guide

🎯 Overview

This guide will help you set up and understand the Auto-Analyst backend system. Auto-Analyst is a multi-agent AI platform that orchestrates specialized agents for comprehensive data analysis.

🏗️ Core Concepts

1. Multi-Agent System

The platform uses specialized AI agents:

Preprocessing Agent: Data cleaning and preparation
Statistical Analytics Agent: Statistical analysis and insights
Machine Learning Agent: Scikit-learn based modeling
Data Visualization Agent: Chart and plot generation

2. Template System

Individual Agents: Single-purpose agents for specific tasks
Planner Agents: Multi-agent coordination for complex workflows
User Templates: Customizable agent preferences
Default vs Premium: Core agents available to all users

3. Session Management

Session-based user tracking
Shared DataFrame context between agents
Conversation history and code execution tracking

4. Deep Analysis System

Multi-step analysis workflow (questions → planning → execution → synthesis)
Streaming progress updates
HTML report generation

🚀 Quick Start

1. Installation

# Clone and navigate to backend
cd Auto-Analyst-CS/auto-analyst-backend

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Linux/Mac
# or
venv\Scripts\activate     # Windows

# Install dependencies
pip install -r requirements.txt

2. Environment Variables

Create .env file with:

# Database
DATABASE_URL=sqlite:///./auto_analyst.db  # For development
# DATABASE_URL=postgresql://user:pass@host:port/db  # For production

# AI Models
ANTHROPIC_API_KEY=your_anthropic_key_here
OPENAI_API_KEY=your_openai_key_here

# Authentication (optional)
ADMIN_API_KEY=your_admin_key_here

3. Database Initialization

# Initialize database and default agents
python -c "
from src.db.init_db import init_db
init_db()
print('✅ Database initialized successfully')
"

4. Start the Server

# Development server
python app.py

# Or with uvicorn
uvicorn app:app --reload --host 0.0.0.0 --port 8000

5. Verify Setup

Visit: http://localhost:8000/docs for interactive API documentation

📚 Key Files to Understand

Core Application Files

app.py - Main FastAPI application and core endpoints
src/agents/agents.py - Agent definitions and orchestration
src/agents/deep_agents.py - Deep analysis system
src/db/schemas/models.py - Database models
src/managers/chat_manager.py - Chat and session management

Route Files (API Endpoints)

src/routes/session_routes.py - File uploads, sessions, authentication
src/routes/chat_routes.py - Chat and messaging
src/routes/code_routes.py - Code execution and processing
src/routes/templates_routes.py - Agent template management
src/routes/deep_analysis_routes.py - Deep analysis reports
src/routes/analytics_routes.py - Usage analytics and monitoring

Configuration Files

agents_config.json - Agent and template definitions
requirements.txt - Python dependencies
alembic.ini - Database migration configuration

🔧 Development Workflow

1. Adding New Agents

# 1. Define agent signature in src/agents/agents.py
class new_agent(dspy.Signature):
    """Agent description"""
    goal = dspy.InputField(desc="Analysis goal")
    dataset = dspy.InputField(desc="Dataset info")
    result = dspy.OutputField(desc="Analysis result")

# 2. Add to agents_config.json
{
  "template_name": "new_agent",
  "description": "Agent description",
  "variant_type": "both",
  "is_premium": false,
  "usage_count": 0
}

# 3. Register in agent loading system

2. Adding New Endpoints

# 1. Create route in src/routes/feature_routes.py
from fastapi import APIRouter
router = APIRouter(prefix="/feature", tags=["feature"])

@router.get("/endpoint")
async def new_endpoint():
    return {"message": "Hello"}

# 2. Register in app.py
from src.routes.feature_routes import router as feature_router
app.include_router(feature_router)

3. Database Changes

# 1. Modify models in src/db/schemas/models.py
# 2. Create migration
alembic revision --autogenerate -m "description"
# 3. Apply migration
alembic upgrade head

🧪 Testing Your Changes

1. Test API Endpoints

# Use the interactive docs
open http://localhost:8000/docs

# Or use curl
curl -X GET "http://localhost:8000/health"

2. Test Agent System

# Test individual agent
python -c "
from src.agents.agents import preprocessing_agent
import dspy
dspy.LM('anthropic/claude-sonnet-4-20250514')
agent = dspy.ChainOfThought(preprocessing_agent)
result = agent(goal='clean data', dataset='test data')
print(result)
"

3. Test Database Operations

# Test database
python -c "
from src.db.init_db import session_factory
from src.db.schemas.models import AgentTemplate
session = session_factory()
templates = session.query(AgentTemplate).all()
print(f'Found {len(templates)} templates')
session.close()
"

🔍 Common Development Tasks

Adding a New Feature

Plan the Feature: Define requirements and API design
Database Changes: Add new models if needed
Create Routes: Add API endpoints in src/routes/
Business Logic: Add managers in src/managers/ if complex
Documentation: Update relevant .md files
Testing: Test endpoints and integration

Debugging Issues

Check Logs: Application logs show detailed error information
Database State: Verify data with database queries
API Testing: Use /docs interface for endpoint testing
Agent Behavior: Test individual agents separately

Performance Optimization

Database Queries: Use SQLAlchemy query optimization
Agent Execution: Implement async patterns for agent orchestration
Resource Management: Monitor memory usage for large datasets

📊 System Architecture Overview

graph TD
    A[Frontend Request] --> B[FastAPI Router]
    B --> C[Route Handler]
    C --> D[Manager Layer]
    D --> E[Database Layer]
    D --> F[Agent System]
    F --> G[AI Models]
    G --> H[Code Generation]
    H --> I[Execution Environment]
    I --> J[Results Processing]
    J --> K[Response]
    
    subgraph "Agent Orchestration"
        F1[Individual Agents]
        F2[Planner Module]
        F3[Deep Analysis]
        F1 --> F2
        F2 --> F3
    end
    
    F --> F1

📈 Template Integration

The system uses active user templates for agent selection:

Default Agents (Always Available)

preprocessing_agent (individual & planner variants)
statistical_analytics_agent (individual & planner variants)
sk_learn_agent (individual & planner variants)
data_viz_agent (individual & planner variants)

Template Loading Logic

Individual Agent Execution (@agent_name): Loads ALL available templates
Planner Execution: Loads user's enabled templates (max 10 for performance)
Deep Analysis: Uses user's active template preferences
Fallback: Uses 4 core agents if no user preferences found

This architecture ensures users can leverage their preferred agents while maintaining system performance and reliability.