File size: 12,388 Bytes
b05d409
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
# Auto-Analyst Backend System Architecture

## Overview

Auto-Analyst is a sophisticated multi-agent AI platform designed for comprehensive data analysis. The backend system orchestrates specialized AI agents, manages user sessions, and provides a robust API for data processing and analysis workflows.

## πŸ—οΈ High-Level Architecture

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Frontend      β”‚    β”‚     Backend      β”‚    β”‚    Database     β”‚
β”‚   (Next.js)     │◄──►│    (FastAPI)     │◄──►│ (PostgreSQL/    β”‚
β”‚                 β”‚    β”‚                  β”‚    β”‚  SQLite)        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                              β–Ό
                       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                       β”‚   AI Models      β”‚
                       β”‚   (DSPy/LLMs)    β”‚
                       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚
                              β–Ό
                       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                       β”‚ Agent System     β”‚
                       β”‚ [Processing]     β”‚
                       β”‚ [Analytics]      β”‚
                       β”‚ [Visualization]  β”‚
                       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

## 🎯 Core Components

### 1. Application Layer (`app.py`)

**FastAPI Application Server**
- **Role**: Main HTTP server and request router
- **Responsibilities**:
  - Request/response handling
  - Session-based authentication
  - Route registration and middleware
  - Error handling and logging
  - Static file serving
  - CORS configuration

**Key Features**:
- Async/await support for high concurrency
- Automatic API documentation generation
- Request validation with Pydantic
- Session management for user tracking

### 2. Agent System (`src/agents/`)

**Multi-Agent Orchestra**
- **Core Agents**: Specialized AI agents for different analysis tasks
- **Deep Analysis**: Advanced multi-agent coordination system
- **Template System**: User-customizable agent configurations

#### Agent Types

1. **Individual Agents** (`agents.py`):
   ```python
   - preprocessing_agent         # Data cleaning and preparation
   - statistical_analytics_agent # Statistical analysis
   - sk_learn_agent             # Machine learning with scikit-learn
   - data_viz_agent             # Data visualization
   - basic_qa_agent             # General Q&A
   ```

2. **Planner Agents** (Multi-agent coordination):
   ```python
   - planner_preprocessing_agent
   - planner_statistical_analytics_agent
   - planner_sk_learn_agent
   - planner_data_viz_agent
   ```

3. **Deep Analysis System** (`deep_agents.py`):
   ```python
   - deep_questions         # Question generation
   - deep_planner          # Execution planning
   - deep_code_synthesizer # Code combination
   - deep_synthesizer      # Result synthesis
   - final_conclusion      # Report generation
   ```

#### Agent Architecture Pattern

```python
class AgentSignature(dspy.Signature):
    """Agent description and purpose"""
    goal = dspy.InputField(desc="Analysis objective")
    dataset = dspy.InputField(desc="Dataset information")
    plan_instructions = dspy.InputField(desc="Execution plan")
    
    summary = dspy.OutputField(desc="Analysis summary")
    code = dspy.OutputField(desc="Generated code")
```

### 3. Database Layer (`src/db/`)

**Data Persistence and Management**

#### Database Models (`schemas/models.py`):

```python
# Core Models
User              # User accounts and authentication
Chat              # Conversation sessions
Message           # Individual messages in chats
ModelUsage        # AI model usage tracking

# Template System
AgentTemplate     # Agent definitions and configurations
UserTemplatePreference  # User's enabled/disabled agents

# Deep Analysis
DeepAnalysisReport     # Analysis reports and results

# Analytics
CodeExecution     # Code execution tracking
UserAnalytics     # User behavior analytics
```

#### Database Architecture:

```
Users (1) ──────── (Many) Chats
  β”‚                        β”‚
  β”‚                        β–Ό
  └─── (Many) ModelUsage β”€β”€β”˜
  β”‚
  └─── (Many) UserTemplatePreference
               β”‚
               β–Ό
         AgentTemplate
```

### 4. Route Handlers (`src/routes/`)

**RESTful API Endpoints**

| Module | Purpose | Key Endpoints |
|--------|---------|---------------|
| `core_routes.py` | Core functionality | `/upload_excel`, `/session_info`, `/health` |
| `chat_routes.py` | Chat management | `/chats`, `/messages`, `/delete_chat` |
| `code_routes.py` | Code operations | `/execute_code`, `/get_latest_code` |
| `templates_routes.py` | Agent templates | `/templates`, `/user/{id}/enabled` |
| `deep_analysis_routes.py` | Deep analysis | `/reports`, `/download_from_db` |
| `analytics_routes.py` | System analytics | `/usage`, `/feedback`, `/costs` |
| `feedback_routes.py` | User feedback | `/feedback`, `/message/{id}/feedback` |

### 5. Business Logic Layer (`src/managers/`)

**Service Layer for Complex Operations**

#### Manager Components:

1. **`chat_manager.py`**:
   ```python
   - Session management
   - Message handling
   - Context preservation
   - Agent orchestration
   ```

2. **`ai_manager.py`**:
   ```python
   - Model selection and routing
   - Token tracking and cost calculation
   - Error handling and retries
   - Response formatting
   ```

3. **`session_manager.py`**:
   ```python
   - Session lifecycle management
   - Data sharing between agents
   - Memory management
   - Cleanup operations
   ```

### 6. Utility Layer (`src/utils/`)

**Shared Services and Helpers**

- **`logger.py`**: Centralized logging system
- **`generate_report.py`**: HTML report generation
- **`model_registry.py`**: AI model configuration

## πŸ”„ Data Flow Architecture

### 1. Request Processing Flow

```
HTTP Request β†’ FastAPI Router β†’ Route Handler β†’ Manager/Business Logic β†’ 
Database/Agent System β†’ AI Model β†’ Response Processing β†’ JSON Response
```

### 2. Agent Execution Flow

```
User Query β†’ Session Creation β†’ Template Selection β†’ Agent Loading β†’ 
Code Generation β†’ Code Execution β†’ Result Processing β†’ Response Formatting
```

### 3. Deep Analysis Flow

```
Analysis Goal β†’ Question Generation β†’ Planning Phase β†’ Agent Coordination β†’ 
Code Synthesis β†’ Execution β†’ Result Synthesis β†’ Final Report Generation
```

### 4. Template System Flow

```
User Preferences β†’ Template Loading β†’ Agent Registration β†’ 
Capability Mapping β†’ Execution Routing β†’ Usage Tracking
```

## 🎨 Design Patterns

### 1. **Module Pattern**
- Clear separation of concerns
- Each module has specific responsibilities
- Minimal dependencies between modules

### 2. **Repository Pattern**
- Database access abstracted through SQLAlchemy
- Session management centralized
- Clean separation of data and business logic

### 3. **Strategy Pattern**
- Multiple AI models supported through unified interface
- Agent selection based on user preferences
- Dynamic template loading

### 4. **Observer Pattern**
- Usage tracking and analytics
- Event-driven model updates
- Real-time progress notifications

### 5. **Factory Pattern**
- Agent creation based on template configurations
- Session factory for database connections
- Dynamic model instantiation

## πŸ”§ Configuration Management

### Environment Configuration

```python
# Database
DATABASE_URL: str           # Database connection string
POSTGRES_PASSWORD: str      # PostgreSQL password (optional)

# AI Models
ANTHROPIC_API_KEY: str      # Claude API key
OPENAI_API_KEY: str         # OpenAI API key

# Authentication
ADMIN_API_KEY: str          # Admin operations key (optional)

# Deployment
PORT: int = 8000            # Server port
DEBUG: bool = False         # Debug mode
```

### Agent Configuration (`agents_config.json`)

```json
{
  "default_agents": [
    {
      "template_name": "preprocessing_agent",
      "description": "Data cleaning and preparation",
      "variant_type": "both",
      "is_premium": false,
      "usage_count": 0,
      "icon_url": "preprocessing.svg"
    }
  ],
  "premium_templates": [...],
  "remove": [...]
}
```

## πŸ”’ Security Architecture

### Authentication & Authorization

1. **Session-based Authentication**:
   - Session IDs for user identification
   - Optional API key authentication for admin endpoints

2. **Input Validation**:
   - Pydantic models for request validation
   - SQL injection prevention through SQLAlchemy
   - File upload restrictions and validation

3. **Resource Protection**:
   - User-specific data isolation
   - Usage tracking and monitoring
   - Rate limiting considerations

### Data Security

1. **Database Security**:
   - Encrypted connections for PostgreSQL
   - Parameterized queries
   - Regular backup procedures

2. **Code Execution Security**:
   - Sandboxed code execution environment
   - Limited library imports
   - Timeout protection

## πŸ“Š Performance Architecture

### Scalability Features

1. **Async Architecture**:
   - Non-blocking I/O operations
   - Concurrent agent execution
   - Streaming responses for long operations

2. **Database Optimization**:
   - Connection pooling
   - Query optimization
   - Indexed frequently accessed columns

3. **Caching Strategy**:
   - In-memory caching for templates
   - Result caching for expensive operations
   - Session data management

### Performance Monitoring

1. **Usage Analytics**:
   - Request/response time tracking
   - Token usage monitoring
   - Error rate analysis

2. **Resource Monitoring**:
   - Database query performance
   - Memory usage tracking
   - Agent execution time analysis

## πŸš€ Deployment Architecture

### Development Environment

```
Local Development β†’ SQLite Database β†’ File-based Logging β†’ 
Direct Model API Calls β†’ Hot Reloading
```

### Production Environment

```
Load Balancer β†’ Multiple FastAPI Instances β†’ PostgreSQL Database β†’ 
Centralized Logging β†’ Monitoring & Alerting
```

### Container Architecture

```dockerfile
# Multi-stage build for optimization
FROM python:3.11-slim as base
# Dependencies and application setup
# Health checks and graceful shutdown
# Environment-specific configurations
```

## πŸ”„ Integration Patterns

### External Service Integration

1. **AI Model Providers**:
   - Anthropic (Claude)
   - OpenAI (GPT models)
   - Unified interface through DSPy

2. **Database Systems**:
   - PostgreSQL (production)
   - SQLite (development)
   - Migration support through Alembic

### Frontend Integration

1. **REST API**:
   - Standard HTTP endpoints
   - JSON request/response format
   - Session-based communication

2. **Data Exchange**:
   - File upload capabilities
   - Real-time analysis results
   - Report generation and download

### Third-Party Integration

1. **Python Data Science Stack**:
   - Pandas for data manipulation
   - NumPy for numerical computing
   - Scikit-learn for machine learning
   - Plotly for visualization
   - Statsmodels for statistical analysis

2. **Development Tools**:
   - Alembic for database migrations
   - SQLAlchemy for ORM
   - FastAPI for web framework
   - Pydantic for data validation

## πŸ“ Documentation Architecture

### API Documentation

1. **Auto-generated Docs**: Available at `/docs` endpoint
2. **Schema Definitions**: Pydantic models with descriptions
3. **Endpoint Documentation**: Detailed parameter and response docs

### Code Documentation

1. **Inline Documentation**: Comprehensive docstrings
2. **Architecture Guides**: High-level system design documentation
3. **Getting Started**: Developer onboarding documentation
4. **Troubleshooting**: Common issues and solutions

This architecture provides a robust, scalable foundation for multi-agent AI analysis while maintaining clean separation of concerns and supporting both development and production deployment scenarios.