Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
File size: 12,905 Bytes
7643a03 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 |
# Auto-Analyst Database Schema Documentation
## π Overview
The Auto-Analyst backend uses a relational database schema designed for scalability and data integrity. The schema supports both **SQLite** (development) and **PostgreSQL** (production) databases through SQLAlchemy ORM.
### **Database Features**
- **User Management** - Authentication and user data
- **Chat System** - Conversation sessions and message history
- **AI Model Tracking** - Usage analytics and cost monitoring
- **Code Execution** - Code generation and execution tracking
- **Agent Templates** - Customizable AI agent configurations
- **Deep Analysis** - Multi-step analysis reports and results
- **User Feedback** - Rating and feedback system
---
## ποΈ Database Tables
### **1. Users Table (`users`)**
**Purpose**: Core user authentication and profile management
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| `user_id` | `INTEGER` | PRIMARY KEY, AUTO INCREMENT | Unique user identifier |
| `username` | `STRING` | UNIQUE, NOT NULL | User's display name |
| `email` | `STRING` | UNIQUE, NOT NULL | User's email address |
| `created_at` | `DATETIME` | DEFAULT: UTC NOW | Account creation timestamp |
**Relationships:**
- **One-to-Many**: `chats` (User β Chat sessions)
- **One-to-Many**: `usage_records` (User β Model usage tracking)
- **One-to-Many**: `deep_analysis_reports` (User β Analysis reports)
- **One-to-Many**: `template_preferences` (User β Agent preferences)
---
### **2. Chats Table (`chats`)**
**Purpose**: Conversation sessions and chat organization
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| `chat_id` | `INTEGER` | PRIMARY KEY, AUTO INCREMENT | Unique chat session identifier |
| `user_id` | `INTEGER` | FOREIGN KEY β `users.user_id`, CASCADE DELETE | Chat owner (nullable for anonymous) |
| `title` | `STRING` | DEFAULT: 'New Chat' | Human-readable chat title |
| `created_at` | `DATETIME` | DEFAULT: UTC NOW | Chat creation timestamp |
**Relationships:**
- **Many-to-One**: `user` (Chat β User)
- **One-to-Many**: `messages` (Chat β Messages)
- **One-to-Many**: `usage_records` (Chat β Model usage)
---
### **3. Messages Table (`messages`)**
**Purpose**: Individual messages within chat conversations
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| `message_id` | `INTEGER` | PRIMARY KEY, AUTO INCREMENT | Unique message identifier |
| `chat_id` | `INTEGER` | FOREIGN KEY β `chats.chat_id`, CASCADE DELETE | Parent chat session |
| `sender` | `STRING` | NOT NULL | Message sender: 'user' or 'ai' |
| `content` | `TEXT` | NOT NULL | Message content (text/markdown) |
| `timestamp` | `DATETIME` | DEFAULT: UTC NOW | Message creation time |
**Relationships:**
- **Many-to-One**: `chat` (Message β Chat)
- **One-to-One**: `feedback` (Message β Feedback)
---
### **4. Model Usage Table (`model_usage`)**
**Purpose**: AI model usage tracking for analytics and billing
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| `usage_id` | `INTEGER` | PRIMARY KEY | Unique usage record identifier |
| `user_id` | `INTEGER` | FOREIGN KEY β `users.user_id`, SET NULL | User who triggered the usage |
| `chat_id` | `INTEGER` | FOREIGN KEY β `chats.chat_id`, SET NULL | Associated chat session |
| `model_name` | `STRING(100)` | NOT NULL | AI model used (e.g., 'gpt-4o-mini') |
| `provider` | `STRING(50)` | NOT NULL | Model provider ('openai', 'anthropic', etc.) |
| `prompt_tokens` | `INTEGER` | DEFAULT: 0 | Input tokens consumed |
| `completion_tokens` | `INTEGER` | DEFAULT: 0 | Output tokens generated |
| `total_tokens` | `INTEGER` | DEFAULT: 0 | Total tokens (input + output) |
| `query_size` | `INTEGER` | DEFAULT: 0 | Query size in characters |
| `response_size` | `INTEGER` | DEFAULT: 0 | Response size in characters |
| `cost` | `FLOAT` | DEFAULT: 0.0 | Cost in USD for this usage |
| `timestamp` | `DATETIME` | DEFAULT: UTC NOW | Usage timestamp |
| `is_streaming` | `BOOLEAN` | DEFAULT: FALSE | Whether response was streamed |
| `request_time_ms` | `INTEGER` | DEFAULT: 0 | Request processing time (milliseconds) |
**Relationships:**
- **Many-to-One**: `user` (Usage β User)
- **Many-to-One**: `chat` (Usage β Chat)
---
### **5. Code Executions Table (`code_executions`)**
**Purpose**: Track code generation and execution attempts
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| `execution_id` | `INTEGER` | PRIMARY KEY, AUTO INCREMENT | Unique execution identifier |
| `message_id` | `INTEGER` | FOREIGN KEY β `messages.message_id`, CASCADE DELETE | Associated message |
| `chat_id` | `INTEGER` | FOREIGN KEY β `chats.chat_id`, CASCADE DELETE | Parent chat session |
| `user_id` | `INTEGER` | FOREIGN KEY β `users.user_id`, SET NULL | User who triggered execution |
| `initial_code` | `TEXT` | NULLABLE | First version of generated code |
| `latest_code` | `TEXT` | NULLABLE | Most recent code version |
| `is_successful` | `BOOLEAN` | DEFAULT: FALSE | Whether execution succeeded |
| `output` | `TEXT` | NULLABLE | Execution output (including errors) |
| `model_provider` | `STRING(50)` | NULLABLE | AI model provider used |
| `model_name` | `STRING(100)` | NULLABLE | AI model name used |
| `failed_agents` | `TEXT` | NULLABLE | JSON list of failed agent names |
| `error_messages` | `TEXT` | NULLABLE | JSON map of error messages by agent |
| `created_at` | `DATETIME` | DEFAULT: UTC NOW | Execution creation time |
| `updated_at` | `DATETIME` | DEFAULT: UTC NOW, ON UPDATE | Last update timestamp |
---
### **6. Message Feedback Table (`message_feedback`)**
**Purpose**: User feedback and model settings for messages
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| `feedback_id` | `INTEGER` | PRIMARY KEY, AUTO INCREMENT | Unique feedback identifier |
| `message_id` | `INTEGER` | FOREIGN KEY β `messages.message_id`, CASCADE DELETE | Associated message |
| `rating` | `INTEGER` | NULLABLE | Star rating (1-5 scale) |
| `model_name` | `STRING(100)` | NULLABLE | Model used for this message |
| `model_provider` | `STRING(50)` | NULLABLE | Model provider used |
| `temperature` | `FLOAT` | NULLABLE | Temperature setting used |
| `max_tokens` | `INTEGER` | NULLABLE | Max tokens setting used |
| `created_at` | `DATETIME` | DEFAULT: UTC NOW | Feedback creation time |
| `updated_at` | `DATETIME` | DEFAULT: UTC NOW, ON UPDATE | Last update timestamp |
**Relationships:**
- **One-to-One**: `message` (Feedback β Message)
---
### **7. Deep Analysis Reports Table (`deep_analysis_reports`)**
**Purpose**: Store comprehensive multi-agent analysis reports
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| `report_id` | `INTEGER` | PRIMARY KEY, AUTO INCREMENT | Unique report identifier |
| `report_uuid` | `STRING(100)` | UNIQUE, NOT NULL | Frontend-generated UUID |
| `user_id` | `INTEGER` | FOREIGN KEY β `users.user_id`, CASCADE DELETE | Report owner |
| `goal` | `TEXT` | NOT NULL | Analysis objective/question |
| `status` | `STRING(20)` | NOT NULL, DEFAULT: 'pending' | Status: 'pending', 'running', 'completed', 'failed' |
| `start_time` | `DATETIME` | DEFAULT: UTC NOW | Analysis start time |
| `end_time` | `DATETIME` | NULLABLE | Analysis completion time |
| `duration_seconds` | `INTEGER` | NULLABLE | Total analysis duration |
| `deep_questions` | `TEXT` | NULLABLE | Generated analytical questions |
| `deep_plan` | `TEXT` | NULLABLE | Analysis execution plan |
| `summaries` | `JSON` | NULLABLE | Array of analysis summaries |
| `analysis_code` | `TEXT` | NULLABLE | Generated Python code |
| `plotly_figures` | `JSON` | NULLABLE | Array of Plotly figure data |
| `synthesis` | `JSON` | NULLABLE | Array of synthesis insights |
| `final_conclusion` | `TEXT` | NULLABLE | Final analysis conclusion |
| `html_report` | `TEXT` | NULLABLE | Complete HTML report |
| `progress_percentage` | `INTEGER` | DEFAULT: 0 | Progress percentage (0-100) |
| `total_tokens_used` | `INTEGER` | DEFAULT: 0 | Total tokens consumed |
| `estimated_cost` | `FLOAT` | DEFAULT: 0.0 | Estimated cost in USD |
| `credits_consumed` | `INTEGER` | DEFAULT: 0 | Credits deducted for analysis |
| `created_at` | `DATETIME` | DEFAULT: UTC NOW | Report creation time |
| `updated_at` | `DATETIME` | DEFAULT: UTC NOW, ON UPDATE | Last update timestamp |
**Relationships:**
- **Many-to-One**: `user` (Report β User)
---
### **8. Agent Templates Table (`agent_templates`)**
**Purpose**: Store predefined AI agent configurations
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| `template_id` | `INTEGER` | PRIMARY KEY, AUTO INCREMENT | Unique template identifier |
| `template_name` | `STRING(100)` | UNIQUE, NOT NULL | Internal template name |
| `display_name` | `STRING(200)` | NULLABLE | User-friendly display name |
| `description` | `TEXT` | NOT NULL | Template description |
| `prompt_template` | `TEXT` | NOT NULL | Agent behavior instructions |
| `icon_url` | `STRING(500)` | NULLABLE | Template icon URL |
| `category` | `STRING(50)` | NULLABLE | Template category |
| `is_premium_only` | `BOOLEAN` | DEFAULT: FALSE | Requires premium subscription |
| `variant_type` | `STRING(20)` | DEFAULT: 'individual' | 'planner', 'individual', or 'both' |
| `is_active` | `BOOLEAN` | DEFAULT: TRUE | Template is active/available |
| `created_at` | `DATETIME` | DEFAULT: UTC NOW | Template creation time |
**Relationships:**
- **One-to-Many**: `user_preferences` (Template β User preferences)
---
### **9. User Template Preferences Table (`user_template_preferences`)**
**Purpose**: Track user preferences and usage for agent templates
| Column | Type | Constraints | Description |
|--------|------|-------------|-------------|
| `preference_id` | `INTEGER` | PRIMARY KEY, AUTO INCREMENT | Unique preference identifier |
| `user_id` | `INTEGER` | FOREIGN KEY β `users.user_id`, CASCADE DELETE | User who owns preference |
| `template_id` | `INTEGER` | FOREIGN KEY β `agent_templates.template_id`, CASCADE DELETE | Associated template |
| `is_enabled` | `BOOLEAN` | DEFAULT: TRUE | Whether user has template enabled |
| `usage_count` | `INTEGER` | DEFAULT: 0 | Number of times user used template |
| `last_used_at` | `DATETIME` | NULLABLE | Last time user used template |
| `created_at` | `DATETIME` | DEFAULT: UTC NOW | Preference creation time |
**Relationships:**
- **Many-to-One**: `user` (Preference β User)
- **Many-to-One**: `template` (Preference β Template)
**Constraints:**
- **Unique**: `(user_id, template_id)` - One preference per user per template
---
## π Entity Relationship Diagram
```
Users (1) ββββββββββββ (Many) Chats
β β
β βββ (Many) Messages
β β β
β β βββ (1) MessageFeedback
β β
β βββ (Many) CodeExecutions
β
βββ (Many) ModelUsage
β
βββ (Many) DeepAnalysisReports
β
βββ (Many) UserTemplatePreferences
β
βββ (Many) AgentTemplates
```
---
## π Database Performance
### **Optimized Indexes**
```sql
-- High-performance queries
CREATE INDEX idx_messages_chat_timestamp ON messages(chat_id, timestamp DESC);
CREATE INDEX idx_model_usage_user_time ON model_usage(user_id, timestamp DESC);
CREATE INDEX idx_model_usage_model_time ON model_usage(model_name, timestamp DESC);
CREATE INDEX idx_reports_user_time ON deep_analysis_reports(user_id, created_at DESC);
```
### **Cascade Deletion Rules**
| Parent β Child | Rule | Description |
|----------------|------|-------------|
| `users` β `chats` | CASCADE | Delete all user chats when user deleted |
| `chats` β `messages` | CASCADE | Delete all chat messages when chat deleted |
| `messages` β `feedback` | CASCADE | Delete feedback when message deleted |
| `users` β `model_usage` | SET NULL | Keep usage records for analytics |
---
## π‘οΈ Security & Maintenance
### **Data Protection**
- User data isolated by `user_id`
- Sensitive fields require encryption in production
- Automatic cleanup of anonymous data after 90 days
### **Regular Maintenance**
```sql
-- Clean old anonymous chats
DELETE FROM chats WHERE user_id IS NULL AND created_at < DATE_SUB(NOW(), INTERVAL 90 DAY);
-- Update statistics for query optimization
ANALYZE users, chats, messages, model_usage;
```
---
This schema supports the full Auto-Analyst application with optimized performance, data integrity, and scalability for both development and production environments. |