File size: 5,707 Bytes
c9f1afa |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 |
# ποΈ RepoRover System Architecture
RepoRover is an AI-powered code analysis platform that provides deep insights into GitHub repositories. The system is built on a modern, scalable architecture that combines FastAPI for the backend, AI models for code understanding, and a clean, responsive frontend.
## π Core Principles
- **Modular Design**: Components are loosely coupled and follow the single responsibility principle
- **Extensible**: Easy to add new analysis modules or integrate with different AI models
- **Real-time Processing**: Provides immediate feedback during repository analysis
- **Scalable**: Designed to handle repositories of various sizes efficiently
## π§© Core Components
### 1. Backend Services
- **FastAPI Application**: Handles HTTP requests and serves the frontend
- **Background Task Queue**: Manages long-running repository analysis tasks
- **API Endpoints**:
- `/ingest`: Start repository ingestion
- `/ingest/status/{task_id}`: Check ingestion status
- `/query`: Submit questions about the repository
### 2. AI Components
- **Dispatcher Agent**: Orchestrates the analysis workflow
- **Semantic Memory Manager**: Handles storage and retrieval of code knowledge
- **AI Model Integrations**: Support for multiple AI providers (Gemini, Groq)
### 3. Frontend
- **Single Page Application**: Built with vanilla JavaScript
- **Responsive UI**: Using Tailwind CSS for styling
- **Real-time Updates**: WebSocket-based updates for long-running tasks
### 4. Data Storage
- **Semantic Memory**: Stores processed code information
- **Vector Database**: For efficient similarity search of code patterns
- **Task Status Tracking**: In-memory storage for monitoring analysis progress
## π Ingestion Workflow
The ingestion process transforms a GitHub repository into a structured knowledge base that can be queried naturally.
### Trigger
- User submits a GitHub repository URL through the web interface
### Process Flow
1. **Repository Cloning**
- Clones the target repository locally
- Scans the repository structure
- Identifies different file types and their relationships
2. **Code Analysis**
- Parses source code files
- Extracts functions, classes, and their documentation
- Builds a semantic understanding of the codebase
- Identifies dependencies between components
3. **Knowledge Base Population**
- Stores extracted information in the semantic memory
- Generates vector embeddings for semantic search
- Builds a knowledge graph of the codebase
```mermaid
graph TD
A[Start: GitHub URL] --> B(Dispatcher Agent);
B --> C{Clones Repo & Scans Files};
C --> D[Architect Agent];
D --> E[Librarian Agent];
E --> F[Annotator Agent];
subgraph Semantic Memory
G[Entity Store - SQLite];
H[Knowledge Graph - NetworkX];
I[Vector Store - ChromaDB];
end
D -- Creates Code Entities & Relationships --> H;
D -- Stores Code Details --> G;
E -- Creates Doc Chunks --> I;
E -- Stores Doc Details --> G;
F -- Generates Summaries --> G;
F -- Updates Embeddings --> I;
F --> J[End: Ingestion Complete];
```
## π¬ Query Processing Workflow
### Trigger
- User submits a natural language question about the codebase
### Process Flow
1. **Query Understanding**
- Analyzes the user's question
- Identifies key concepts and intents
- Determines relevant parts of the codebase to examine
2. **Context Retrieval**
- Searches the semantic memory for relevant code snippets
- Retrieves related documentation and examples
- Gathers contextual information about the code
3. **Response Generation**
- Formulates a comprehensive answer using AI
- Includes relevant code examples
- Provides additional context and suggestions
## π Deployment Architecture
```
βββββββββββββββββββ βββββββββββββββββββββββ ββββββββββββββββββββ
β β β β β β
β User's Browser ββββββΊβ FastAPI Backend ββββββΊβ AI Models β
β β β (Python) β β (Gemini, Groq) β
βββββββββββββββββββ βββββββββββ¬ββββββββββββ ββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββ
β β
β Semantic Memory β
β (ChromaDB) β
β β
βββββββββββββββββββββ
```
## π Data Flow
1. **Ingestion Path**
- GitHub Repo β FastAPI β Background Task β AI Processing β Semantic Memory
2. **Query Path**
- User Question β FastAPI β AI Model β Semantic Memory β Response Generation β User
```mermaid
graph TD
A[Start: User Question] --> B(Dispatcher Agent);
B -- Assembles Cognitive Context --> C[Query Planner Agent];
subgraph Cognitive Context
D[Episodic Memory - History];
E[Core Memory - Persona];
end
D --> B;
E --> B;
C -- Creates Plan --> F[Information Retriever Agent];
F -- Executes Plan --> G((Semantic Memory));
G -- Returns Data --> H[Synthesizer Agent];
H -- Generates Response --> I[End: Final Answer];
``` |