Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Auto-Analyst Backend Documentation
This directory contains comprehensive documentation for the Auto-Analyst backend - a sophisticated multi-agent AI platform for data analysis built with FastAPI, DSPy, and modern Python technologies.
π Documentation Structure
ποΈ Architecture (/architecture/)
- System Architecture - Comprehensive overview of backend system design, components, and data flow patterns
π Development (/development/)
- Development Workflow - Complete development guide with patterns, best practices, and code organization principles
π§ System (/system/)
- Database Schema - Complete database schema with all tables, relationships, and performance optimization
- Shared DataFrame System - Inter-agent data sharing and session management
π API (/api/)
- API Endpoints Overview - Main API reference hub
- Route Documentation - Detailed endpoint documentation:
- Core Routes - File uploads, sessions, authentication
- Chat Routes - Chat and messaging endpoints
- Code Routes - Code execution and processing
- Analytics Routes - Usage analytics and monitoring
- Deep Analysis Routes - Multi-agent analysis system
- Template Routes - Agent template management
- Feedback Routes - User feedback and rating system
π Troubleshooting (/troubleshooting/)
- Troubleshooting Guide - Common issues, debugging tools, and solutions
π― Backend Overview
Tech Stack
- FastAPI - Modern async Python web framework
- DSPy - AI agent orchestration and LLM integration
- SQLAlchemy - Database ORM with PostgreSQL/SQLite support
- Plotly - Interactive data visualizations
- Pandas/NumPy - Data manipulation and analysis
- Scikit-learn - Machine learning models
- Statsmodels - Statistical analysis
Core Features
- Multi-Agent System - 4+ specialized AI agents for different analysis tasks
- Template System - User-customizable agent configurations
- Deep Analysis - Multi-step analytical workflows with streaming progress
- Session Management - Stateful user sessions with shared data context
- Code Execution - Safe Python code execution environment
- Real-time Streaming - WebSocket support for live analysis updates
Agent Types
- Data Preprocessing Agent - Data cleaning and preparation
- Statistical Analytics Agent - Statistical analysis using statsmodels
- Machine Learning Agent - ML modeling with scikit-learn
- Data Visualization Agent - Interactive charts with Plotly
- Feature Engineering Agent (Premium) - Advanced feature creation
- Polars Agent (Premium) - High-performance data processing
π Quick Start Guide
1. Environment Setup
# Navigate to backend directory
cd Auto-Analyst-CS/auto-analyst-backend
# Create virtual environment
python -m venv venv
source venv/bin/activate # Linux/Mac
venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
2. Environment Configuration
Create .env file with required variables:
# Database Configuration
DATABASE_URL=sqlite:///./chat_database.db
# AI Model Configuration
OPENAI_API_KEY=your-openai-api-key
MODEL_PROVIDER=openai # openai, anthropic, groq, gemini
MODEL_NAME=gpt-4o-mini
TEMPERATURE=0.7
MAX_TOKENS=6000
# Optional: Additional AI Providers
ANTHROPIC_API_KEY=your-anthropic-key
GROQ_API_KEY=your-groq-key
GEMINI_API_KEY=your-gemini-key
# Security
ADMIN_API_KEY=your-admin-key
# Application Settings
ENVIRONMENT=development
FRONTEND_URL=http://localhost:3000/
3. Database Initialization
# Initialize database and default agents
python -c "
from src.db.init_db import init_db
init_db()
print('β
Database and agents initialized successfully')
"
4. Start Development Server
# Start the FastAPI server
python -m app
# Or with uvicorn for more control
uvicorn app:app --reload --host 0.0.0.0 --port 8000
5. Verify Installation
- API Documentation:
http://localhost:8000/docs - Health Check:
http://localhost:8000/health
π§ Development Workflow
Adding New Agents
- Define Agent Signature in
src/agents/agents.py - Add Configuration to
agents_config.json - Register Agent in loading system
- Test Integration with multi-agent pipeline
Adding New API Endpoints
- Create Route File in
src/routes/ - Define Pydantic Models for request/response
- Implement Endpoints with proper error handling
- Register Router in
app.py - Update Documentation
Database Changes
- Modify Models in
src/db/schemas/models.py - Create Migration:
alembic revision --autogenerate -m "description" - Apply Migration:
alembic upgrade head - Update Documentation
π System Architecture
Request Processing Flow
HTTP Request β FastAPI Router β Route Handler β Business Logic β
Database/Agent System β AI Model β Response Processing β JSON Response
Agent Execution Flow
User Query β Session Manager β Agent Selection β Context Preparation β
DSPy Chain β AI Model β Code Generation β Execution β Response Formatting
Deep Analysis Workflow
Goal Input β Question Generation β Planning β Multi-Agent Execution β
Code Synthesis β Result Compilation β HTML Report Generation
π§ͺ Testing & Validation
API Testing
# Interactive documentation
open http://localhost:8000/docs
# cURL examples
curl -X GET "http://localhost:8000/health"
curl -X POST "http://localhost:8000/chat/preprocessing_agent" \
-H "Content-Type: application/json" \
-d '{"query": "Clean this dataset", "session_id": "test"}'
Agent Testing
# Test individual agents
from src.agents.agents import preprocessing_agent
import dspy
# Configure DSPy
lm = dspy.LM('openai/gpt-4o-mini', api_key='your-key')
dspy.configure(lm=lm)
# Test agent
agent = dspy.ChainOfThought(preprocessing_agent)
result = agent(goal='clean data', dataset='test dataset')
print(result)
π Security & Production
Security Features
- Session-based authentication with secure session management
- API key protection for admin endpoints
- Input validation using Pydantic models
- Error handling with proper HTTP status codes
- CORS configuration for frontend integration
Production Considerations
- PostgreSQL database for production deployment
- Environment variable management for secrets
- Logging configuration for monitoring
- Rate limiting for API protection
- Performance optimization for large datasets
π Monitoring & Analytics
The backend includes comprehensive analytics for:
- Usage tracking - API endpoint usage and performance
- Model usage - AI model consumption and costs
- User analytics - User behavior and engagement
- Error monitoring - System health and error tracking
- Performance metrics - Response times and throughput
π€ Contributing
- Follow coding standards defined in development workflow
- Add comprehensive tests for new features
- Update documentation for all changes
- Use proper error handling patterns
- Submit detailed pull requests with clear descriptions
π Detailed Documentation
For specific implementation details, refer to the organized documentation in each subdirectory:
- Getting Started Guide - Complete setup walkthrough
- Architecture Documentation - System design and components
- Development Guides - Workflow and best practices
- API Reference - Complete endpoint documentation
- System Documentation - Database and core systems
- Troubleshooting - Debugging and solutions
Need help? Check the troubleshooting guide or refer to the comprehensive documentation in each section.