File size: 5,707 Bytes
c9f1afa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
# πŸ—οΈ RepoRover System Architecture

RepoRover is an AI-powered code analysis platform that provides deep insights into GitHub repositories. The system is built on a modern, scalable architecture that combines FastAPI for the backend, AI models for code understanding, and a clean, responsive frontend.

## 🌟 Core Principles

- **Modular Design**: Components are loosely coupled and follow the single responsibility principle
- **Extensible**: Easy to add new analysis modules or integrate with different AI models
- **Real-time Processing**: Provides immediate feedback during repository analysis
- **Scalable**: Designed to handle repositories of various sizes efficiently

## 🧩 Core Components

### 1. Backend Services
- **FastAPI Application**: Handles HTTP requests and serves the frontend
- **Background Task Queue**: Manages long-running repository analysis tasks
- **API Endpoints**: 
  - `/ingest`: Start repository ingestion
  - `/ingest/status/{task_id}`: Check ingestion status
  - `/query`: Submit questions about the repository

### 2. AI Components
- **Dispatcher Agent**: Orchestrates the analysis workflow
- **Semantic Memory Manager**: Handles storage and retrieval of code knowledge
- **AI Model Integrations**: Support for multiple AI providers (Gemini, Groq)

### 3. Frontend
- **Single Page Application**: Built with vanilla JavaScript
- **Responsive UI**: Using Tailwind CSS for styling
- **Real-time Updates**: WebSocket-based updates for long-running tasks

### 4. Data Storage
- **Semantic Memory**: Stores processed code information
- **Vector Database**: For efficient similarity search of code patterns
- **Task Status Tracking**: In-memory storage for monitoring analysis progress

## πŸ”„ Ingestion Workflow

The ingestion process transforms a GitHub repository into a structured knowledge base that can be queried naturally.

### Trigger
- User submits a GitHub repository URL through the web interface

### Process Flow
1. **Repository Cloning**
   - Clones the target repository locally
   - Scans the repository structure
   - Identifies different file types and their relationships

2. **Code Analysis**
   - Parses source code files
   - Extracts functions, classes, and their documentation
   - Builds a semantic understanding of the codebase
   - Identifies dependencies between components

3. **Knowledge Base Population**
   - Stores extracted information in the semantic memory
   - Generates vector embeddings for semantic search
   - Builds a knowledge graph of the codebase

```mermaid
graph TD
    A[Start: GitHub URL] --> B(Dispatcher Agent);
    B --> C{Clones Repo & Scans Files};
    C --> D[Architect Agent];
    D --> E[Librarian Agent];
    E --> F[Annotator Agent];
    
    subgraph Semantic Memory
        G[Entity Store - SQLite];
        H[Knowledge Graph - NetworkX];
        I[Vector Store - ChromaDB];
    end

    D -- Creates Code Entities & Relationships --> H;
    D -- Stores Code Details --> G;
    E -- Creates Doc Chunks --> I;
    E -- Stores Doc Details --> G;
    F -- Generates Summaries --> G;
    F -- Updates Embeddings --> I;

    F --> J[End: Ingestion Complete];
```

## πŸ’¬ Query Processing Workflow

### Trigger
- User submits a natural language question about the codebase

### Process Flow
1. **Query Understanding**
   - Analyzes the user's question
   - Identifies key concepts and intents
   - Determines relevant parts of the codebase to examine

2. **Context Retrieval**
   - Searches the semantic memory for relevant code snippets
   - Retrieves related documentation and examples
   - Gathers contextual information about the code

3. **Response Generation**
   - Formulates a comprehensive answer using AI
   - Includes relevant code examples
   - Provides additional context and suggestions

## πŸš€ Deployment Architecture

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                 β”‚     β”‚                     β”‚     β”‚                  β”‚
β”‚  User's Browser β”œβ”€β”€β”€β”€β–Ίβ”‚  FastAPI Backend    │◄───►│  AI Models       β”‚
β”‚                 β”‚     β”‚  (Python)           β”‚     β”‚  (Gemini, Groq)  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                  β”‚
                                  β–Ό
                         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                         β”‚                   β”‚
                         β”‚  Semantic Memory  β”‚
                         β”‚  (ChromaDB)       β”‚
                         β”‚                   β”‚
                         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

## πŸ”„ Data Flow

1. **Ingestion Path**
   - GitHub Repo β†’ FastAPI β†’ Background Task β†’ AI Processing β†’ Semantic Memory

2. **Query Path**
   - User Question β†’ FastAPI β†’ AI Model β†’ Semantic Memory β†’ Response Generation β†’ User

```mermaid
graph TD
    A[Start: User Question] --> B(Dispatcher Agent);
    B -- Assembles Cognitive Context --> C[Query Planner Agent];
    
    subgraph Cognitive Context
        D[Episodic Memory - History];
        E[Core Memory - Persona];
    end

    D --> B;
    E --> B;
    
    C -- Creates Plan --> F[Information Retriever Agent];
    F -- Executes Plan --> G((Semantic Memory));
    G -- Returns Data --> H[Synthesizer Agent];
    H -- Generates Response --> I[End: Final Answer];
```