Spaces:

FireBird-Tech
/

auto-analyst-backend

Running on CPU Upgrade

App Files Files

auto-analyst-backend / docs /README.md

Arslan1997

lfg

11794cc 2 months ago

preview code

raw

history blame

8.55 kB

Auto-Analyst Backend Documentation

This directory contains comprehensive documentation for the Auto-Analyst backend - a sophisticated multi-agent AI platform for data analysis built with FastAPI, DSPy, and modern Python technologies.

📁 Documentation Structure

🏗️ Architecture (`/architecture/`)

System Architecture - Comprehensive overview of backend system design, components, and data flow patterns

🚀 Development (`/development/`)

Development Workflow - Complete development guide with patterns, best practices, and code organization principles

🔧 System (`/system/`)

Database Schema - Complete database schema with all tables, relationships, and performance optimization
Shared DataFrame System - Inter-agent data sharing and session management

🌐 API (`/api/`)

API Endpoints Overview - Main API reference hub
Route Documentation - Detailed endpoint documentation:
- Core Routes - File uploads, sessions, authentication
- Chat Routes - Chat and messaging endpoints
- Code Routes - Code execution and processing
- Analytics Routes - Usage analytics and monitoring
- Deep Analysis Routes - Multi-agent analysis system
- Template Routes - Agent template management
- Feedback Routes - User feedback and rating system

🐛 Troubleshooting (`/troubleshooting/`)

Troubleshooting Guide - Common issues, debugging tools, and solutions

🎯 Backend Overview

Tech Stack

FastAPI - Modern async Python web framework
DSPy - AI agent orchestration and LLM integration
SQLAlchemy - Database ORM with PostgreSQL/SQLite support
Plotly - Interactive data visualizations
Pandas/NumPy - Data manipulation and analysis
Scikit-learn - Machine learning models
Statsmodels - Statistical analysis

Core Features

Multi-Agent System - 4+ specialized AI agents for different analysis tasks
Template System - User-customizable agent configurations
Deep Analysis - Multi-step analytical workflows with streaming progress
Session Management - Stateful user sessions with shared data context
Code Execution - Safe Python code execution environment
Real-time Streaming - WebSocket support for live analysis updates

Agent Types

Data Preprocessing Agent - Data cleaning and preparation
Statistical Analytics Agent - Statistical analysis using statsmodels
Machine Learning Agent - ML modeling with scikit-learn
Data Visualization Agent - Interactive charts with Plotly
Feature Engineering Agent (Premium) - Advanced feature creation
Polars Agent (Premium) - High-performance data processing

🚀 Quick Start Guide

1. Environment Setup

# Navigate to backend directory
cd Auto-Analyst-CS/auto-analyst-backend

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Linux/Mac
venv\Scripts\activate     # Windows

# Install dependencies
pip install -r requirements.txt

2. Environment Configuration

Create .env file with required variables:

# Database Configuration
DATABASE_URL=sqlite:///./chat_database.db

# AI Model Configuration
OPENAI_API_KEY=your-openai-api-key
MODEL_PROVIDER=openai  # openai, anthropic, groq, gemini
MODEL_NAME=gpt-4o-mini
TEMPERATURE=0.7
MAX_TOKENS=6000

# Optional: Additional AI Providers
ANTHROPIC_API_KEY=your-anthropic-key
GROQ_API_KEY=your-groq-key
GEMINI_API_KEY=your-gemini-key

# Security
ADMIN_API_KEY=your-admin-key

# Application Settings
ENVIRONMENT=development
FRONTEND_URL=http://localhost:3000/

3. Database Initialization

# Initialize database and default agents
python -c "
from src.db.init_db import init_db
init_db()
print('✅ Database and agents initialized successfully')
"

4. Start Development Server

# Start the FastAPI server
python -m app

# Or with uvicorn for more control
uvicorn app:app --reload --host 0.0.0.0 --port 8000

5. Verify Installation

API Documentation: http://localhost:8000/docs
Health Check: http://localhost:8000/health

🔧 Development Workflow

Adding New Agents

Define Agent Signature in src/agents/agents.py
Add Configuration to agents_config.json
Register Agent in loading system
Test Integration with multi-agent pipeline

Adding New API Endpoints

Create Route File in src/routes/
Define Pydantic Models for request/response
Implement Endpoints with proper error handling
Register Router in app.py
Update Documentation

Database Changes

Modify Models in src/db/schemas/models.py
Create Migration: alembic revision --autogenerate -m "description"
Apply Migration: alembic upgrade head
Update Documentation

📊 System Architecture

Request Processing Flow

HTTP Request → FastAPI Router → Route Handler → Business Logic → 
Database/Agent System → AI Model → Response Processing → JSON Response

Agent Execution Flow

User Query → Session Manager → Agent Selection → Context Preparation → 
DSPy Chain → AI Model → Code Generation → Execution → Response Formatting

Deep Analysis Workflow

Goal Input → Question Generation → Planning → Multi-Agent Execution → 
Code Synthesis → Result Compilation → HTML Report Generation

🧪 Testing & Validation

API Testing

# Interactive documentation
open http://localhost:8000/docs

# cURL examples
curl -X GET "http://localhost:8000/health"
curl -X POST "http://localhost:8000/chat/preprocessing_agent" \
  -H "Content-Type: application/json" \
  -d '{"query": "Clean this dataset", "session_id": "test"}'

Agent Testing

# Test individual agents
from src.agents.agents import preprocessing_agent
import dspy

# Configure DSPy
lm = dspy.LM('openai/gpt-4o-mini', api_key='your-key')
dspy.configure(lm=lm)

# Test agent
agent = dspy.ChainOfThought(preprocessing_agent)
result = agent(goal='clean data', dataset='test dataset')
print(result)

🔒 Security & Production

Security Features

Session-based authentication with secure session management
API key protection for admin endpoints
Input validation using Pydantic models
Error handling with proper HTTP status codes
CORS configuration for frontend integration

Production Considerations

PostgreSQL database for production deployment
Environment variable management for secrets
Logging configuration for monitoring
Rate limiting for API protection
Performance optimization for large datasets

📈 Monitoring & Analytics

The backend includes comprehensive analytics for:

Usage tracking - API endpoint usage and performance
Model usage - AI model consumption and costs
User analytics - User behavior and engagement
Error monitoring - System health and error tracking
Performance metrics - Response times and throughput

🤝 Contributing

Follow coding standards defined in development workflow
Add comprehensive tests for new features
Update documentation for all changes
Use proper error handling patterns
Submit detailed pull requests with clear descriptions

📖 Detailed Documentation

For specific implementation details, refer to the organized documentation in each subdirectory:

Getting Started Guide - Complete setup walkthrough
Architecture Documentation - System design and components
Development Guides - Workflow and best practices
API Reference - Complete endpoint documentation
System Documentation - Database and core systems
Troubleshooting - Debugging and solutions

Need help? Check the troubleshooting guide or refer to the comprehensive documentation in each section.