auto-analyst-backend / docs /system /database-schema.md
GitHub Actions
Merge branch 'FireBird-Technologies:main' into main
7643a03

Auto-Analyst Database Schema Documentation

πŸ“‹ Overview

The Auto-Analyst backend uses a relational database schema designed for scalability and data integrity. The schema supports both SQLite (development) and PostgreSQL (production) databases through SQLAlchemy ORM.

Database Features

  • User Management - Authentication and user data
  • Chat System - Conversation sessions and message history
  • AI Model Tracking - Usage analytics and cost monitoring
  • Code Execution - Code generation and execution tracking
  • Agent Templates - Customizable AI agent configurations
  • Deep Analysis - Multi-step analysis reports and results
  • User Feedback - Rating and feedback system

πŸ—„οΈ Database Tables

1. Users Table (users)

Purpose: Core user authentication and profile management

Column Type Constraints Description
user_id INTEGER PRIMARY KEY, AUTO INCREMENT Unique user identifier
username STRING UNIQUE, NOT NULL User's display name
email STRING UNIQUE, NOT NULL User's email address
created_at DATETIME DEFAULT: UTC NOW Account creation timestamp

Relationships:

  • One-to-Many: chats (User β†’ Chat sessions)
  • One-to-Many: usage_records (User β†’ Model usage tracking)
  • One-to-Many: deep_analysis_reports (User β†’ Analysis reports)
  • One-to-Many: template_preferences (User β†’ Agent preferences)

2. Chats Table (chats)

Purpose: Conversation sessions and chat organization

Column Type Constraints Description
chat_id INTEGER PRIMARY KEY, AUTO INCREMENT Unique chat session identifier
user_id INTEGER FOREIGN KEY β†’ users.user_id, CASCADE DELETE Chat owner (nullable for anonymous)
title STRING DEFAULT: 'New Chat' Human-readable chat title
created_at DATETIME DEFAULT: UTC NOW Chat creation timestamp

Relationships:

  • Many-to-One: user (Chat β†’ User)
  • One-to-Many: messages (Chat β†’ Messages)
  • One-to-Many: usage_records (Chat β†’ Model usage)

3. Messages Table (messages)

Purpose: Individual messages within chat conversations

Column Type Constraints Description
message_id INTEGER PRIMARY KEY, AUTO INCREMENT Unique message identifier
chat_id INTEGER FOREIGN KEY β†’ chats.chat_id, CASCADE DELETE Parent chat session
sender STRING NOT NULL Message sender: 'user' or 'ai'
content TEXT NOT NULL Message content (text/markdown)
timestamp DATETIME DEFAULT: UTC NOW Message creation time

Relationships:

  • Many-to-One: chat (Message β†’ Chat)
  • One-to-One: feedback (Message β†’ Feedback)

4. Model Usage Table (model_usage)

Purpose: AI model usage tracking for analytics and billing

Column Type Constraints Description
usage_id INTEGER PRIMARY KEY Unique usage record identifier
user_id INTEGER FOREIGN KEY β†’ users.user_id, SET NULL User who triggered the usage
chat_id INTEGER FOREIGN KEY β†’ chats.chat_id, SET NULL Associated chat session
model_name STRING(100) NOT NULL AI model used (e.g., 'gpt-4o-mini')
provider STRING(50) NOT NULL Model provider ('openai', 'anthropic', etc.)
prompt_tokens INTEGER DEFAULT: 0 Input tokens consumed
completion_tokens INTEGER DEFAULT: 0 Output tokens generated
total_tokens INTEGER DEFAULT: 0 Total tokens (input + output)
query_size INTEGER DEFAULT: 0 Query size in characters
response_size INTEGER DEFAULT: 0 Response size in characters
cost FLOAT DEFAULT: 0.0 Cost in USD for this usage
timestamp DATETIME DEFAULT: UTC NOW Usage timestamp
is_streaming BOOLEAN DEFAULT: FALSE Whether response was streamed
request_time_ms INTEGER DEFAULT: 0 Request processing time (milliseconds)

Relationships:

  • Many-to-One: user (Usage β†’ User)
  • Many-to-One: chat (Usage β†’ Chat)

5. Code Executions Table (code_executions)

Purpose: Track code generation and execution attempts

Column Type Constraints Description
execution_id INTEGER PRIMARY KEY, AUTO INCREMENT Unique execution identifier
message_id INTEGER FOREIGN KEY β†’ messages.message_id, CASCADE DELETE Associated message
chat_id INTEGER FOREIGN KEY β†’ chats.chat_id, CASCADE DELETE Parent chat session
user_id INTEGER FOREIGN KEY β†’ users.user_id, SET NULL User who triggered execution
initial_code TEXT NULLABLE First version of generated code
latest_code TEXT NULLABLE Most recent code version
is_successful BOOLEAN DEFAULT: FALSE Whether execution succeeded
output TEXT NULLABLE Execution output (including errors)
model_provider STRING(50) NULLABLE AI model provider used
model_name STRING(100) NULLABLE AI model name used
failed_agents TEXT NULLABLE JSON list of failed agent names
error_messages TEXT NULLABLE JSON map of error messages by agent
created_at DATETIME DEFAULT: UTC NOW Execution creation time
updated_at DATETIME DEFAULT: UTC NOW, ON UPDATE Last update timestamp

6. Message Feedback Table (message_feedback)

Purpose: User feedback and model settings for messages

Column Type Constraints Description
feedback_id INTEGER PRIMARY KEY, AUTO INCREMENT Unique feedback identifier
message_id INTEGER FOREIGN KEY β†’ messages.message_id, CASCADE DELETE Associated message
rating INTEGER NULLABLE Star rating (1-5 scale)
model_name STRING(100) NULLABLE Model used for this message
model_provider STRING(50) NULLABLE Model provider used
temperature FLOAT NULLABLE Temperature setting used
max_tokens INTEGER NULLABLE Max tokens setting used
created_at DATETIME DEFAULT: UTC NOW Feedback creation time
updated_at DATETIME DEFAULT: UTC NOW, ON UPDATE Last update timestamp

Relationships:

  • One-to-One: message (Feedback ↔ Message)

7. Deep Analysis Reports Table (deep_analysis_reports)

Purpose: Store comprehensive multi-agent analysis reports

Column Type Constraints Description
report_id INTEGER PRIMARY KEY, AUTO INCREMENT Unique report identifier
report_uuid STRING(100) UNIQUE, NOT NULL Frontend-generated UUID
user_id INTEGER FOREIGN KEY β†’ users.user_id, CASCADE DELETE Report owner
goal TEXT NOT NULL Analysis objective/question
status STRING(20) NOT NULL, DEFAULT: 'pending' Status: 'pending', 'running', 'completed', 'failed'
start_time DATETIME DEFAULT: UTC NOW Analysis start time
end_time DATETIME NULLABLE Analysis completion time
duration_seconds INTEGER NULLABLE Total analysis duration
deep_questions TEXT NULLABLE Generated analytical questions
deep_plan TEXT NULLABLE Analysis execution plan
summaries JSON NULLABLE Array of analysis summaries
analysis_code TEXT NULLABLE Generated Python code
plotly_figures JSON NULLABLE Array of Plotly figure data
synthesis JSON NULLABLE Array of synthesis insights
final_conclusion TEXT NULLABLE Final analysis conclusion
html_report TEXT NULLABLE Complete HTML report
progress_percentage INTEGER DEFAULT: 0 Progress percentage (0-100)
total_tokens_used INTEGER DEFAULT: 0 Total tokens consumed
estimated_cost FLOAT DEFAULT: 0.0 Estimated cost in USD
credits_consumed INTEGER DEFAULT: 0 Credits deducted for analysis
created_at DATETIME DEFAULT: UTC NOW Report creation time
updated_at DATETIME DEFAULT: UTC NOW, ON UPDATE Last update timestamp

Relationships:

  • Many-to-One: user (Report β†’ User)

8. Agent Templates Table (agent_templates)

Purpose: Store predefined AI agent configurations

Column Type Constraints Description
template_id INTEGER PRIMARY KEY, AUTO INCREMENT Unique template identifier
template_name STRING(100) UNIQUE, NOT NULL Internal template name
display_name STRING(200) NULLABLE User-friendly display name
description TEXT NOT NULL Template description
prompt_template TEXT NOT NULL Agent behavior instructions
icon_url STRING(500) NULLABLE Template icon URL
category STRING(50) NULLABLE Template category
is_premium_only BOOLEAN DEFAULT: FALSE Requires premium subscription
variant_type STRING(20) DEFAULT: 'individual' 'planner', 'individual', or 'both'
is_active BOOLEAN DEFAULT: TRUE Template is active/available
created_at DATETIME DEFAULT: UTC NOW Template creation time

Relationships:

  • One-to-Many: user_preferences (Template β†’ User preferences)

9. User Template Preferences Table (user_template_preferences)

Purpose: Track user preferences and usage for agent templates

Column Type Constraints Description
preference_id INTEGER PRIMARY KEY, AUTO INCREMENT Unique preference identifier
user_id INTEGER FOREIGN KEY β†’ users.user_id, CASCADE DELETE User who owns preference
template_id INTEGER FOREIGN KEY β†’ agent_templates.template_id, CASCADE DELETE Associated template
is_enabled BOOLEAN DEFAULT: TRUE Whether user has template enabled
usage_count INTEGER DEFAULT: 0 Number of times user used template
last_used_at DATETIME NULLABLE Last time user used template
created_at DATETIME DEFAULT: UTC NOW Preference creation time

Relationships:

  • Many-to-One: user (Preference β†’ User)
  • Many-to-One: template (Preference β†’ Template)

Constraints:

  • Unique: (user_id, template_id) - One preference per user per template

πŸ”— Entity Relationship Diagram

Users (1) ──────────── (Many) Chats
  β”‚                           β”‚
  β”‚                           β”œβ”€β”€ (Many) Messages
  β”‚                           β”‚      β”‚
  β”‚                           β”‚      └── (1) MessageFeedback
  β”‚                           β”‚
  β”‚                           └── (Many) CodeExecutions
  β”‚
  β”œβ”€β”€ (Many) ModelUsage
  β”‚
  β”œβ”€β”€ (Many) DeepAnalysisReports
  β”‚
  └── (Many) UserTemplatePreferences
               β”‚
               └── (Many) AgentTemplates

πŸ“Š Database Performance

Optimized Indexes

-- High-performance queries
CREATE INDEX idx_messages_chat_timestamp ON messages(chat_id, timestamp DESC);
CREATE INDEX idx_model_usage_user_time ON model_usage(user_id, timestamp DESC);
CREATE INDEX idx_model_usage_model_time ON model_usage(model_name, timestamp DESC);
CREATE INDEX idx_reports_user_time ON deep_analysis_reports(user_id, created_at DESC);

Cascade Deletion Rules

Parent β†’ Child Rule Description
users β†’ chats CASCADE Delete all user chats when user deleted
chats β†’ messages CASCADE Delete all chat messages when chat deleted
messages β†’ feedback CASCADE Delete feedback when message deleted
users β†’ model_usage SET NULL Keep usage records for analytics

πŸ›‘οΈ Security & Maintenance

Data Protection

  • User data isolated by user_id
  • Sensitive fields require encryption in production
  • Automatic cleanup of anonymous data after 90 days

Regular Maintenance

-- Clean old anonymous chats
DELETE FROM chats WHERE user_id IS NULL AND created_at < DATE_SUB(NOW(), INTERVAL 90 DAY);

-- Update statistics for query optimization
ANALYZE users, chats, messages, model_usage;

This schema supports the full Auto-Analyst application with optimized performance, data integrity, and scalability for both development and production environments.