# Auto-Analyst Database Schema Documentation ## 📋 Overview The Auto-Analyst backend uses a relational database schema designed for scalability and data integrity. The schema supports both **SQLite** (development) and **PostgreSQL** (production) databases through SQLAlchemy ORM. ### **Database Features** - **User Management** - Authentication and user data - **Chat System** - Conversation sessions and message history - **AI Model Tracking** - Usage analytics and cost monitoring - **Code Execution** - Code generation and execution tracking - **Agent Templates** - Customizable AI agent configurations - **Deep Analysis** - Multi-step analysis reports and results - **User Feedback** - Rating and feedback system --- ## 🗄️ Database Tables ### **1. Users Table (`users`)** **Purpose**: Core user authentication and profile management | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | `user_id` | `INTEGER` | PRIMARY KEY, AUTO INCREMENT | Unique user identifier | | `username` | `STRING` | UNIQUE, NOT NULL | User's display name | | `email` | `STRING` | UNIQUE, NOT NULL | User's email address | | `created_at` | `DATETIME` | DEFAULT: UTC NOW | Account creation timestamp | **Relationships:** - **One-to-Many**: `chats` (User → Chat sessions) - **One-to-Many**: `usage_records` (User → Model usage tracking) - **One-to-Many**: `deep_analysis_reports` (User → Analysis reports) - **One-to-Many**: `template_preferences` (User → Agent preferences) --- ### **2. Chats Table (`chats`)** **Purpose**: Conversation sessions and chat organization | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | `chat_id` | `INTEGER` | PRIMARY KEY, AUTO INCREMENT | Unique chat session identifier | | `user_id` | `INTEGER` | FOREIGN KEY → `users.user_id`, CASCADE DELETE | Chat owner (nullable for anonymous) | | `title` | `STRING` | DEFAULT: 'New Chat' | Human-readable chat title | | `created_at` | `DATETIME` | DEFAULT: UTC NOW | Chat creation timestamp | **Relationships:** - **Many-to-One**: `user` (Chat → User) - **One-to-Many**: `messages` (Chat → Messages) - **One-to-Many**: `usage_records` (Chat → Model usage) --- ### **3. Messages Table (`messages`)** **Purpose**: Individual messages within chat conversations | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | `message_id` | `INTEGER` | PRIMARY KEY, AUTO INCREMENT | Unique message identifier | | `chat_id` | `INTEGER` | FOREIGN KEY → `chats.chat_id`, CASCADE DELETE | Parent chat session | | `sender` | `STRING` | NOT NULL | Message sender: 'user' or 'ai' | | `content` | `TEXT` | NOT NULL | Message content (text/markdown) | | `timestamp` | `DATETIME` | DEFAULT: UTC NOW | Message creation time | **Relationships:** - **Many-to-One**: `chat` (Message → Chat) - **One-to-One**: `feedback` (Message → Feedback) --- ### **4. Model Usage Table (`model_usage`)** **Purpose**: AI model usage tracking for analytics and billing | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | `usage_id` | `INTEGER` | PRIMARY KEY | Unique usage record identifier | | `user_id` | `INTEGER` | FOREIGN KEY → `users.user_id`, SET NULL | User who triggered the usage | | `chat_id` | `INTEGER` | FOREIGN KEY → `chats.chat_id`, SET NULL | Associated chat session | | `model_name` | `STRING(100)` | NOT NULL | AI model used (e.g., 'gpt-4o-mini') | | `provider` | `STRING(50)` | NOT NULL | Model provider ('openai', 'anthropic', etc.) | | `prompt_tokens` | `INTEGER` | DEFAULT: 0 | Input tokens consumed | | `completion_tokens` | `INTEGER` | DEFAULT: 0 | Output tokens generated | | `total_tokens` | `INTEGER` | DEFAULT: 0 | Total tokens (input + output) | | `query_size` | `INTEGER` | DEFAULT: 0 | Query size in characters | | `response_size` | `INTEGER` | DEFAULT: 0 | Response size in characters | | `cost` | `FLOAT` | DEFAULT: 0.0 | Cost in USD for this usage | | `timestamp` | `DATETIME` | DEFAULT: UTC NOW | Usage timestamp | | `is_streaming` | `BOOLEAN` | DEFAULT: FALSE | Whether response was streamed | | `request_time_ms` | `INTEGER` | DEFAULT: 0 | Request processing time (milliseconds) | **Relationships:** - **Many-to-One**: `user` (Usage → User) - **Many-to-One**: `chat` (Usage → Chat) --- ### **5. Code Executions Table (`code_executions`)** **Purpose**: Track code generation and execution attempts | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | `execution_id` | `INTEGER` | PRIMARY KEY, AUTO INCREMENT | Unique execution identifier | | `message_id` | `INTEGER` | FOREIGN KEY → `messages.message_id`, CASCADE DELETE | Associated message | | `chat_id` | `INTEGER` | FOREIGN KEY → `chats.chat_id`, CASCADE DELETE | Parent chat session | | `user_id` | `INTEGER` | FOREIGN KEY → `users.user_id`, SET NULL | User who triggered execution | | `initial_code` | `TEXT` | NULLABLE | First version of generated code | | `latest_code` | `TEXT` | NULLABLE | Most recent code version | | `is_successful` | `BOOLEAN` | DEFAULT: FALSE | Whether execution succeeded | | `output` | `TEXT` | NULLABLE | Execution output (including errors) | | `model_provider` | `STRING(50)` | NULLABLE | AI model provider used | | `model_name` | `STRING(100)` | NULLABLE | AI model name used | | `failed_agents` | `TEXT` | NULLABLE | JSON list of failed agent names | | `error_messages` | `TEXT` | NULLABLE | JSON map of error messages by agent | | `created_at` | `DATETIME` | DEFAULT: UTC NOW | Execution creation time | | `updated_at` | `DATETIME` | DEFAULT: UTC NOW, ON UPDATE | Last update timestamp | --- ### **6. Message Feedback Table (`message_feedback`)** **Purpose**: User feedback and model settings for messages | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | `feedback_id` | `INTEGER` | PRIMARY KEY, AUTO INCREMENT | Unique feedback identifier | | `message_id` | `INTEGER` | FOREIGN KEY → `messages.message_id`, CASCADE DELETE | Associated message | | `rating` | `INTEGER` | NULLABLE | Star rating (1-5 scale) | | `model_name` | `STRING(100)` | NULLABLE | Model used for this message | | `model_provider` | `STRING(50)` | NULLABLE | Model provider used | | `temperature` | `FLOAT` | NULLABLE | Temperature setting used | | `max_tokens` | `INTEGER` | NULLABLE | Max tokens setting used | | `created_at` | `DATETIME` | DEFAULT: UTC NOW | Feedback creation time | | `updated_at` | `DATETIME` | DEFAULT: UTC NOW, ON UPDATE | Last update timestamp | **Relationships:** - **One-to-One**: `message` (Feedback ↔ Message) --- ### **7. Deep Analysis Reports Table (`deep_analysis_reports`)** **Purpose**: Store comprehensive multi-agent analysis reports | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | `report_id` | `INTEGER` | PRIMARY KEY, AUTO INCREMENT | Unique report identifier | | `report_uuid` | `STRING(100)` | UNIQUE, NOT NULL | Frontend-generated UUID | | `user_id` | `INTEGER` | FOREIGN KEY → `users.user_id`, CASCADE DELETE | Report owner | | `goal` | `TEXT` | NOT NULL | Analysis objective/question | | `status` | `STRING(20)` | NOT NULL, DEFAULT: 'pending' | Status: 'pending', 'running', 'completed', 'failed' | | `start_time` | `DATETIME` | DEFAULT: UTC NOW | Analysis start time | | `end_time` | `DATETIME` | NULLABLE | Analysis completion time | | `duration_seconds` | `INTEGER` | NULLABLE | Total analysis duration | | `deep_questions` | `TEXT` | NULLABLE | Generated analytical questions | | `deep_plan` | `TEXT` | NULLABLE | Analysis execution plan | | `summaries` | `JSON` | NULLABLE | Array of analysis summaries | | `analysis_code` | `TEXT` | NULLABLE | Generated Python code | | `plotly_figures` | `JSON` | NULLABLE | Array of Plotly figure data | | `synthesis` | `JSON` | NULLABLE | Array of synthesis insights | | `final_conclusion` | `TEXT` | NULLABLE | Final analysis conclusion | | `html_report` | `TEXT` | NULLABLE | Complete HTML report | | `progress_percentage` | `INTEGER` | DEFAULT: 0 | Progress percentage (0-100) | | `total_tokens_used` | `INTEGER` | DEFAULT: 0 | Total tokens consumed | | `estimated_cost` | `FLOAT` | DEFAULT: 0.0 | Estimated cost in USD | | `credits_consumed` | `INTEGER` | DEFAULT: 0 | Credits deducted for analysis | | `created_at` | `DATETIME` | DEFAULT: UTC NOW | Report creation time | | `updated_at` | `DATETIME` | DEFAULT: UTC NOW, ON UPDATE | Last update timestamp | **Relationships:** - **Many-to-One**: `user` (Report → User) --- ### **8. Agent Templates Table (`agent_templates`)** **Purpose**: Store predefined AI agent configurations | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | `template_id` | `INTEGER` | PRIMARY KEY, AUTO INCREMENT | Unique template identifier | | `template_name` | `STRING(100)` | UNIQUE, NOT NULL | Internal template name | | `display_name` | `STRING(200)` | NULLABLE | User-friendly display name | | `description` | `TEXT` | NOT NULL | Template description | | `prompt_template` | `TEXT` | NOT NULL | Agent behavior instructions | | `icon_url` | `STRING(500)` | NULLABLE | Template icon URL | | `category` | `STRING(50)` | NULLABLE | Template category | | `is_premium_only` | `BOOLEAN` | DEFAULT: FALSE | Requires premium subscription | | `variant_type` | `STRING(20)` | DEFAULT: 'individual' | 'planner', 'individual', or 'both' | | `is_active` | `BOOLEAN` | DEFAULT: TRUE | Template is active/available | | `created_at` | `DATETIME` | DEFAULT: UTC NOW | Template creation time | **Relationships:** - **One-to-Many**: `user_preferences` (Template → User preferences) --- ### **9. User Template Preferences Table (`user_template_preferences`)** **Purpose**: Track user preferences and usage for agent templates | Column | Type | Constraints | Description | |--------|------|-------------|-------------| | `preference_id` | `INTEGER` | PRIMARY KEY, AUTO INCREMENT | Unique preference identifier | | `user_id` | `INTEGER` | FOREIGN KEY → `users.user_id`, CASCADE DELETE | User who owns preference | | `template_id` | `INTEGER` | FOREIGN KEY → `agent_templates.template_id`, CASCADE DELETE | Associated template | | `is_enabled` | `BOOLEAN` | DEFAULT: TRUE | Whether user has template enabled | | `usage_count` | `INTEGER` | DEFAULT: 0 | Number of times user used template | | `last_used_at` | `DATETIME` | NULLABLE | Last time user used template | | `created_at` | `DATETIME` | DEFAULT: UTC NOW | Preference creation time | **Relationships:** - **Many-to-One**: `user` (Preference → User) - **Many-to-One**: `template` (Preference → Template) **Constraints:** - **Unique**: `(user_id, template_id)` - One preference per user per template --- ## 🔗 Entity Relationship Diagram ``` Users (1) ──────────── (Many) Chats │ │ │ ├── (Many) Messages │ │ │ │ │ └── (1) MessageFeedback │ │ │ └── (Many) CodeExecutions │ ├── (Many) ModelUsage │ ├── (Many) DeepAnalysisReports │ └── (Many) UserTemplatePreferences │ └── (Many) AgentTemplates ``` --- ## 📊 Database Performance ### **Optimized Indexes** ```sql -- High-performance queries CREATE INDEX idx_messages_chat_timestamp ON messages(chat_id, timestamp DESC); CREATE INDEX idx_model_usage_user_time ON model_usage(user_id, timestamp DESC); CREATE INDEX idx_model_usage_model_time ON model_usage(model_name, timestamp DESC); CREATE INDEX idx_reports_user_time ON deep_analysis_reports(user_id, created_at DESC); ``` ### **Cascade Deletion Rules** | Parent → Child | Rule | Description | |----------------|------|-------------| | `users` → `chats` | CASCADE | Delete all user chats when user deleted | | `chats` → `messages` | CASCADE | Delete all chat messages when chat deleted | | `messages` → `feedback` | CASCADE | Delete feedback when message deleted | | `users` → `model_usage` | SET NULL | Keep usage records for analytics | --- ## 🛡️ Security & Maintenance ### **Data Protection** - User data isolated by `user_id` - Sensitive fields require encryption in production - Automatic cleanup of anonymous data after 90 days ### **Regular Maintenance** ```sql -- Clean old anonymous chats DELETE FROM chats WHERE user_id IS NULL AND created_at < DATE_SUB(NOW(), INTERVAL 90 DAY); -- Update statistics for query optimization ANALYZE users, chats, messages, model_usage; ``` --- This schema supports the full Auto-Analyst application with optimized performance, data integrity, and scalability for both development and production environments.