|
# ποΈ RepoRover System Architecture |
|
|
|
RepoRover is an AI-powered code analysis platform that provides deep insights into GitHub repositories. The system is built on a modern, scalable architecture that combines FastAPI for the backend, AI models for code understanding, and a clean, responsive frontend. |
|
|
|
## π Core Principles |
|
|
|
- **Modular Design**: Components are loosely coupled and follow the single responsibility principle |
|
- **Extensible**: Easy to add new analysis modules or integrate with different AI models |
|
- **Real-time Processing**: Provides immediate feedback during repository analysis |
|
- **Scalable**: Designed to handle repositories of various sizes efficiently |
|
|
|
## π§© Core Components |
|
|
|
### 1. Backend Services |
|
- **FastAPI Application**: Handles HTTP requests and serves the frontend |
|
- **Background Task Queue**: Manages long-running repository analysis tasks |
|
- **API Endpoints**: |
|
- `/ingest`: Start repository ingestion |
|
- `/ingest/status/{task_id}`: Check ingestion status |
|
- `/query`: Submit questions about the repository |
|
|
|
### 2. AI Components |
|
- **Dispatcher Agent**: Orchestrates the analysis workflow |
|
- **Semantic Memory Manager**: Handles storage and retrieval of code knowledge |
|
- **AI Model Integrations**: Support for multiple AI providers (Gemini, Groq) |
|
|
|
### 3. Frontend |
|
- **Single Page Application**: Built with vanilla JavaScript |
|
- **Responsive UI**: Using Tailwind CSS for styling |
|
- **Real-time Updates**: WebSocket-based updates for long-running tasks |
|
|
|
### 4. Data Storage |
|
- **Semantic Memory**: Stores processed code information |
|
- **Vector Database**: For efficient similarity search of code patterns |
|
- **Task Status Tracking**: In-memory storage for monitoring analysis progress |
|
|
|
## π Ingestion Workflow |
|
|
|
The ingestion process transforms a GitHub repository into a structured knowledge base that can be queried naturally. |
|
|
|
### Trigger |
|
- User submits a GitHub repository URL through the web interface |
|
|
|
### Process Flow |
|
1. **Repository Cloning** |
|
- Clones the target repository locally |
|
- Scans the repository structure |
|
- Identifies different file types and their relationships |
|
|
|
2. **Code Analysis** |
|
- Parses source code files |
|
- Extracts functions, classes, and their documentation |
|
- Builds a semantic understanding of the codebase |
|
- Identifies dependencies between components |
|
|
|
3. **Knowledge Base Population** |
|
- Stores extracted information in the semantic memory |
|
- Generates vector embeddings for semantic search |
|
- Builds a knowledge graph of the codebase |
|
|
|
```mermaid |
|
graph TD |
|
A[Start: GitHub URL] --> B(Dispatcher Agent); |
|
B --> C{Clones Repo & Scans Files}; |
|
C --> D[Architect Agent]; |
|
D --> E[Librarian Agent]; |
|
E --> F[Annotator Agent]; |
|
|
|
subgraph Semantic Memory |
|
G[Entity Store - SQLite]; |
|
H[Knowledge Graph - NetworkX]; |
|
I[Vector Store - ChromaDB]; |
|
end |
|
|
|
D -- Creates Code Entities & Relationships --> H; |
|
D -- Stores Code Details --> G; |
|
E -- Creates Doc Chunks --> I; |
|
E -- Stores Doc Details --> G; |
|
F -- Generates Summaries --> G; |
|
F -- Updates Embeddings --> I; |
|
|
|
F --> J[End: Ingestion Complete]; |
|
``` |
|
|
|
## π¬ Query Processing Workflow |
|
|
|
### Trigger |
|
- User submits a natural language question about the codebase |
|
|
|
### Process Flow |
|
1. **Query Understanding** |
|
- Analyzes the user's question |
|
- Identifies key concepts and intents |
|
- Determines relevant parts of the codebase to examine |
|
|
|
2. **Context Retrieval** |
|
- Searches the semantic memory for relevant code snippets |
|
- Retrieves related documentation and examples |
|
- Gathers contextual information about the code |
|
|
|
3. **Response Generation** |
|
- Formulates a comprehensive answer using AI |
|
- Includes relevant code examples |
|
- Provides additional context and suggestions |
|
|
|
## π Deployment Architecture |
|
|
|
``` |
|
βββββββββββββββββββ βββββββββββββββββββββββ ββββββββββββββββββββ |
|
β β β β β β |
|
β User's Browser ββββββΊβ FastAPI Backend ββββββΊβ AI Models β |
|
β β β (Python) β β (Gemini, Groq) β |
|
βββββββββββββββββββ βββββββββββ¬ββββββββββββ ββββββββββββββββββββ |
|
β |
|
βΌ |
|
βββββββββββββββββββββ |
|
β β |
|
β Semantic Memory β |
|
β (ChromaDB) β |
|
β β |
|
βββββββββββββββββββββ |
|
``` |
|
|
|
## π Data Flow |
|
|
|
1. **Ingestion Path** |
|
- GitHub Repo β FastAPI β Background Task β AI Processing β Semantic Memory |
|
|
|
2. **Query Path** |
|
- User Question β FastAPI β AI Model β Semantic Memory β Response Generation β User |
|
|
|
```mermaid |
|
graph TD |
|
A[Start: User Question] --> B(Dispatcher Agent); |
|
B -- Assembles Cognitive Context --> C[Query Planner Agent]; |
|
|
|
subgraph Cognitive Context |
|
D[Episodic Memory - History]; |
|
E[Core Memory - Persona]; |
|
end |
|
|
|
D --> B; |
|
E --> B; |
|
|
|
C -- Creates Plan --> F[Information Retriever Agent]; |
|
F -- Executes Plan --> G((Semantic Memory)); |
|
G -- Returns Data --> H[Synthesizer Agent]; |
|
H -- Generates Response --> I[End: Final Answer]; |
|
``` |