auto-analyst-backend / docs /system /database-schema.md
GitHub Actions
Merge branch 'FireBird-Technologies:main' into main
7643a03
# Auto-Analyst Database Schema Documentation
## πŸ“‹ Overview
The Auto-Analyst backend uses a relational database schema designed for scalability and data integrity. The schema supports both **SQLite** (development) and **PostgreSQL** (production) databases through SQLAlchemy ORM.
### **Database Features**
- **User Management** - Authentication and user data
- **Chat System** - Conversation sessions and message history
- **AI Model Tracking** - Usage analytics and cost monitoring
- **Code Execution** - Code generation and execution tracking
- **Agent Templates** - Customizable AI agent configurations
- **Deep Analysis** - Multi-step analysis reports and results
- **User Feedback** - Rating and feedback system
---
## πŸ—„οΈ Database Tables
### **1. Users Table (`users`)**
**Purpose**: Core user authentication and profile management
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| `user_id` | `INTEGER` | PRIMARY KEY, AUTO INCREMENT | Unique user identifier |
| `username` | `STRING` | UNIQUE, NOT NULL | User's display name |
| `email` | `STRING` | UNIQUE, NOT NULL | User's email address |
| `created_at` | `DATETIME` | DEFAULT: UTC NOW | Account creation timestamp |
**Relationships:**
- **One-to-Many**: `chats` (User β†’ Chat sessions)
- **One-to-Many**: `usage_records` (User β†’ Model usage tracking)
- **One-to-Many**: `deep_analysis_reports` (User β†’ Analysis reports)
- **One-to-Many**: `template_preferences` (User β†’ Agent preferences)
---
### **2. Chats Table (`chats`)**
**Purpose**: Conversation sessions and chat organization
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| `chat_id` | `INTEGER` | PRIMARY KEY, AUTO INCREMENT | Unique chat session identifier |
| `user_id` | `INTEGER` | FOREIGN KEY β†’ `users.user_id`, CASCADE DELETE | Chat owner (nullable for anonymous) |
| `title` | `STRING` | DEFAULT: 'New Chat' | Human-readable chat title |
| `created_at` | `DATETIME` | DEFAULT: UTC NOW | Chat creation timestamp |
**Relationships:**
- **Many-to-One**: `user` (Chat β†’ User)
- **One-to-Many**: `messages` (Chat β†’ Messages)
- **One-to-Many**: `usage_records` (Chat β†’ Model usage)
---
### **3. Messages Table (`messages`)**
**Purpose**: Individual messages within chat conversations
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| `message_id` | `INTEGER` | PRIMARY KEY, AUTO INCREMENT | Unique message identifier |
| `chat_id` | `INTEGER` | FOREIGN KEY β†’ `chats.chat_id`, CASCADE DELETE | Parent chat session |
| `sender` | `STRING` | NOT NULL | Message sender: 'user' or 'ai' |
| `content` | `TEXT` | NOT NULL | Message content (text/markdown) |
| `timestamp` | `DATETIME` | DEFAULT: UTC NOW | Message creation time |
**Relationships:**
- **Many-to-One**: `chat` (Message β†’ Chat)
- **One-to-One**: `feedback` (Message β†’ Feedback)
---
### **4. Model Usage Table (`model_usage`)**
**Purpose**: AI model usage tracking for analytics and billing
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| `usage_id` | `INTEGER` | PRIMARY KEY | Unique usage record identifier |
| `user_id` | `INTEGER` | FOREIGN KEY β†’ `users.user_id`, SET NULL | User who triggered the usage |
| `chat_id` | `INTEGER` | FOREIGN KEY β†’ `chats.chat_id`, SET NULL | Associated chat session |
| `model_name` | `STRING(100)` | NOT NULL | AI model used (e.g., 'gpt-4o-mini') |
| `provider` | `STRING(50)` | NOT NULL | Model provider ('openai', 'anthropic', etc.) |
| `prompt_tokens` | `INTEGER` | DEFAULT: 0 | Input tokens consumed |
| `completion_tokens` | `INTEGER` | DEFAULT: 0 | Output tokens generated |
| `total_tokens` | `INTEGER` | DEFAULT: 0 | Total tokens (input + output) |
| `query_size` | `INTEGER` | DEFAULT: 0 | Query size in characters |
| `response_size` | `INTEGER` | DEFAULT: 0 | Response size in characters |
| `cost` | `FLOAT` | DEFAULT: 0.0 | Cost in USD for this usage |
| `timestamp` | `DATETIME` | DEFAULT: UTC NOW | Usage timestamp |
| `is_streaming` | `BOOLEAN` | DEFAULT: FALSE | Whether response was streamed |
| `request_time_ms` | `INTEGER` | DEFAULT: 0 | Request processing time (milliseconds) |
**Relationships:**
- **Many-to-One**: `user` (Usage β†’ User)
- **Many-to-One**: `chat` (Usage β†’ Chat)
---
### **5. Code Executions Table (`code_executions`)**
**Purpose**: Track code generation and execution attempts
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| `execution_id` | `INTEGER` | PRIMARY KEY, AUTO INCREMENT | Unique execution identifier |
| `message_id` | `INTEGER` | FOREIGN KEY β†’ `messages.message_id`, CASCADE DELETE | Associated message |
| `chat_id` | `INTEGER` | FOREIGN KEY β†’ `chats.chat_id`, CASCADE DELETE | Parent chat session |
| `user_id` | `INTEGER` | FOREIGN KEY β†’ `users.user_id`, SET NULL | User who triggered execution |
| `initial_code` | `TEXT` | NULLABLE | First version of generated code |
| `latest_code` | `TEXT` | NULLABLE | Most recent code version |
| `is_successful` | `BOOLEAN` | DEFAULT: FALSE | Whether execution succeeded |
| `output` | `TEXT` | NULLABLE | Execution output (including errors) |
| `model_provider` | `STRING(50)` | NULLABLE | AI model provider used |
| `model_name` | `STRING(100)` | NULLABLE | AI model name used |
| `failed_agents` | `TEXT` | NULLABLE | JSON list of failed agent names |
| `error_messages` | `TEXT` | NULLABLE | JSON map of error messages by agent |
| `created_at` | `DATETIME` | DEFAULT: UTC NOW | Execution creation time |
| `updated_at` | `DATETIME` | DEFAULT: UTC NOW, ON UPDATE | Last update timestamp |
---
### **6. Message Feedback Table (`message_feedback`)**
**Purpose**: User feedback and model settings for messages
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| `feedback_id` | `INTEGER` | PRIMARY KEY, AUTO INCREMENT | Unique feedback identifier |
| `message_id` | `INTEGER` | FOREIGN KEY β†’ `messages.message_id`, CASCADE DELETE | Associated message |
| `rating` | `INTEGER` | NULLABLE | Star rating (1-5 scale) |
| `model_name` | `STRING(100)` | NULLABLE | Model used for this message |
| `model_provider` | `STRING(50)` | NULLABLE | Model provider used |
| `temperature` | `FLOAT` | NULLABLE | Temperature setting used |
| `max_tokens` | `INTEGER` | NULLABLE | Max tokens setting used |
| `created_at` | `DATETIME` | DEFAULT: UTC NOW | Feedback creation time |
| `updated_at` | `DATETIME` | DEFAULT: UTC NOW, ON UPDATE | Last update timestamp |
**Relationships:**
- **One-to-One**: `message` (Feedback ↔ Message)
---
### **7. Deep Analysis Reports Table (`deep_analysis_reports`)**
**Purpose**: Store comprehensive multi-agent analysis reports
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| `report_id` | `INTEGER` | PRIMARY KEY, AUTO INCREMENT | Unique report identifier |
| `report_uuid` | `STRING(100)` | UNIQUE, NOT NULL | Frontend-generated UUID |
| `user_id` | `INTEGER` | FOREIGN KEY β†’ `users.user_id`, CASCADE DELETE | Report owner |
| `goal` | `TEXT` | NOT NULL | Analysis objective/question |
| `status` | `STRING(20)` | NOT NULL, DEFAULT: 'pending' | Status: 'pending', 'running', 'completed', 'failed' |
| `start_time` | `DATETIME` | DEFAULT: UTC NOW | Analysis start time |
| `end_time` | `DATETIME` | NULLABLE | Analysis completion time |
| `duration_seconds` | `INTEGER` | NULLABLE | Total analysis duration |
| `deep_questions` | `TEXT` | NULLABLE | Generated analytical questions |
| `deep_plan` | `TEXT` | NULLABLE | Analysis execution plan |
| `summaries` | `JSON` | NULLABLE | Array of analysis summaries |
| `analysis_code` | `TEXT` | NULLABLE | Generated Python code |
| `plotly_figures` | `JSON` | NULLABLE | Array of Plotly figure data |
| `synthesis` | `JSON` | NULLABLE | Array of synthesis insights |
| `final_conclusion` | `TEXT` | NULLABLE | Final analysis conclusion |
| `html_report` | `TEXT` | NULLABLE | Complete HTML report |
| `progress_percentage` | `INTEGER` | DEFAULT: 0 | Progress percentage (0-100) |
| `total_tokens_used` | `INTEGER` | DEFAULT: 0 | Total tokens consumed |
| `estimated_cost` | `FLOAT` | DEFAULT: 0.0 | Estimated cost in USD |
| `credits_consumed` | `INTEGER` | DEFAULT: 0 | Credits deducted for analysis |
| `created_at` | `DATETIME` | DEFAULT: UTC NOW | Report creation time |
| `updated_at` | `DATETIME` | DEFAULT: UTC NOW, ON UPDATE | Last update timestamp |
**Relationships:**
- **Many-to-One**: `user` (Report β†’ User)
---
### **8. Agent Templates Table (`agent_templates`)**
**Purpose**: Store predefined AI agent configurations
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| `template_id` | `INTEGER` | PRIMARY KEY, AUTO INCREMENT | Unique template identifier |
| `template_name` | `STRING(100)` | UNIQUE, NOT NULL | Internal template name |
| `display_name` | `STRING(200)` | NULLABLE | User-friendly display name |
| `description` | `TEXT` | NOT NULL | Template description |
| `prompt_template` | `TEXT` | NOT NULL | Agent behavior instructions |
| `icon_url` | `STRING(500)` | NULLABLE | Template icon URL |
| `category` | `STRING(50)` | NULLABLE | Template category |
| `is_premium_only` | `BOOLEAN` | DEFAULT: FALSE | Requires premium subscription |
| `variant_type` | `STRING(20)` | DEFAULT: 'individual' | 'planner', 'individual', or 'both' |
| `is_active` | `BOOLEAN` | DEFAULT: TRUE | Template is active/available |
| `created_at` | `DATETIME` | DEFAULT: UTC NOW | Template creation time |
**Relationships:**
- **One-to-Many**: `user_preferences` (Template β†’ User preferences)
---
### **9. User Template Preferences Table (`user_template_preferences`)**
**Purpose**: Track user preferences and usage for agent templates
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| `preference_id` | `INTEGER` | PRIMARY KEY, AUTO INCREMENT | Unique preference identifier |
| `user_id` | `INTEGER` | FOREIGN KEY β†’ `users.user_id`, CASCADE DELETE | User who owns preference |
| `template_id` | `INTEGER` | FOREIGN KEY β†’ `agent_templates.template_id`, CASCADE DELETE | Associated template |
| `is_enabled` | `BOOLEAN` | DEFAULT: TRUE | Whether user has template enabled |
| `usage_count` | `INTEGER` | DEFAULT: 0 | Number of times user used template |
| `last_used_at` | `DATETIME` | NULLABLE | Last time user used template |
| `created_at` | `DATETIME` | DEFAULT: UTC NOW | Preference creation time |
**Relationships:**
- **Many-to-One**: `user` (Preference β†’ User)
- **Many-to-One**: `template` (Preference β†’ Template)
**Constraints:**
- **Unique**: `(user_id, template_id)` - One preference per user per template
---
## πŸ”— Entity Relationship Diagram
```
Users (1) ──────────── (Many) Chats
β”‚ β”‚
β”‚ β”œβ”€β”€ (Many) Messages
β”‚ β”‚ β”‚
β”‚ β”‚ └── (1) MessageFeedback
β”‚ β”‚
β”‚ └── (Many) CodeExecutions
β”‚
β”œβ”€β”€ (Many) ModelUsage
β”‚
β”œβ”€β”€ (Many) DeepAnalysisReports
β”‚
└── (Many) UserTemplatePreferences
β”‚
└── (Many) AgentTemplates
```
---
## πŸ“Š Database Performance
### **Optimized Indexes**
```sql
-- High-performance queries
CREATE INDEX idx_messages_chat_timestamp ON messages(chat_id, timestamp DESC);
CREATE INDEX idx_model_usage_user_time ON model_usage(user_id, timestamp DESC);
CREATE INDEX idx_model_usage_model_time ON model_usage(model_name, timestamp DESC);
CREATE INDEX idx_reports_user_time ON deep_analysis_reports(user_id, created_at DESC);
```
### **Cascade Deletion Rules**
| Parent β†’ Child | Rule | Description |
|----------------|------|-------------|
| `users` β†’ `chats` | CASCADE | Delete all user chats when user deleted |
| `chats` β†’ `messages` | CASCADE | Delete all chat messages when chat deleted |
| `messages` β†’ `feedback` | CASCADE | Delete feedback when message deleted |
| `users` β†’ `model_usage` | SET NULL | Keep usage records for analytics |
---
## πŸ›‘οΈ Security & Maintenance
### **Data Protection**
- User data isolated by `user_id`
- Sensitive fields require encryption in production
- Automatic cleanup of anonymous data after 90 days
### **Regular Maintenance**
```sql
-- Clean old anonymous chats
DELETE FROM chats WHERE user_id IS NULL AND created_at < DATE_SUB(NOW(), INTERVAL 90 DAY);
-- Update statistics for query optimization
ANALYZE users, chats, messages, model_usage;
```
---
This schema supports the full Auto-Analyst application with optimized performance, data integrity, and scalability for both development and production environments.