Spaces:

Asish22
/

code-crawler

Running

App Files Files Community

Asish Karthikeya Gogineni commited on 11 days ago

Commit

a3bdcf1

1 Parent(s): 5059b8f

Refactor: Code Structure Update & UI Redesign

Browse files

Files changed (39) hide show

ARCHITECTURE_WALKTHROUGH.md +879 -0
CODE_OF_CONDUCT.md +0 -128
api/routes/index.py +6 -6
app.py +6 -6
architecture_viz.jsx +625 -0
code_chatbot/{agent_workflow.py → agents/agent_workflow.py} +2 -2
code_chatbot/{crews → agents/crews}/__init__.py +0 -0
code_chatbot/{tools.py → agents/tools.py} +0 -0
code_chatbot/analysis/__init__.py +0 -0
code_chatbot/{ast_analysis.py → analysis/ast_analysis.py} +0 -0
code_chatbot/{code_symbols.py → analysis/code_symbols.py} +1 -1
code_chatbot/core/__init__.py +0 -0
code_chatbot/{config.py → core/config.py} +0 -0
code_chatbot/{db_connection.py → core/db_connection.py} +0 -0
code_chatbot/{path_obfuscator.py → core/path_obfuscator.py} +0 -0
code_chatbot/{prompts.py → core/prompts.py} +0 -0
code_chatbot/{rate_limiter.py → core/rate_limiter.py} +0 -0
code_chatbot/ingestion/__init__.py +0 -0
code_chatbot/{chunker.py → ingestion/chunker.py} +0 -0
code_chatbot/{incremental_indexing.py → ingestion/incremental_indexing.py} +3 -3
code_chatbot/{indexer.py → ingestion/indexer.py} +6 -6
code_chatbot/{indexing_progress.py → ingestion/indexing_progress.py} +7 -7
code_chatbot/{merkle_tree.py → ingestion/merkle_tree.py} +0 -0
code_chatbot/{universal_ingestor.py → ingestion/universal_ingestor.py} +44 -4
code_chatbot/mcp/__init__.py +0 -0
code_chatbot/{mcp_client.py → mcp/mcp_client.py} +1 -1
code_chatbot/{mcp_server.py → mcp/mcp_server.py} +0 -0
code_chatbot/retrieval/__init__.py +0 -0
code_chatbot/{graph_rag.py → retrieval/graph_rag.py} +0 -0
code_chatbot/{llm_retriever.py → retrieval/llm_retriever.py} +0 -0
code_chatbot/{rag.py → retrieval/rag.py} +25 -16
code_chatbot/{reranker.py → retrieval/reranker.py} +0 -0
code_chatbot/{retriever_wrapper.py → retrieval/retriever_wrapper.py} +1 -1
components/file_explorer.py +1 -1
components/multi_mode.py +3 -3
components/sidebar.py +1 -1
pages/1_⚡_Code_Studio.py +72 -63
pages/1_⚡_Code_Studio.py.bak +118 -0
tests/test_merkle_tree_simple.py +1 -1

ARCHITECTURE_WALKTHROUGH.md ADDED Viewed

	@@ -0,0 +1,879 @@

+# 🕷️ Code Crawler - Complete Architecture Walkthrough
+## Table of Contents
+1. [Project Overview](#project-overview)
+2. [System Architecture](#system-architecture)
+3. [Data Flow Pipeline](#data-flow-pipeline)
+4. [RAG Implementation](#rag-implementation)
+5. [AST Analysis & Graph Creation](#ast-analysis--graph-creation)
+6. [Code Chunking Strategy](#code-chunking-strategy)
+7. [Retrieval System](#retrieval-system)
+8. [Agentic Workflow](#agentic-workflow)
+9. [Frontend & API](#frontend--api)
+10. [Component Deep Dives](#component-deep-dives)
+---
+## Project Overview
+**Code Crawler** is an AI-powered codebase assistant that combines multiple advanced techniques:
+- **RAG (Retrieval-Augmented Generation)**: Vector-based semantic search over code
+- **AST Analysis**: Abstract Syntax Tree parsing for understanding code structure
+- **Graph RAG**: Knowledge graph enhancement for relationship-aware retrieval
+- **Agentic Workflows**: Multi-step reasoning with tool use (LangGraph)
+- **Multi-LLM Support**: Gemini, Groq (Llama 3.3)
+### Key Features
+| Feature | Description |
+|---------|-------------|
+| 💬 Chat Mode | Natural language Q&A about codebase |
+| 🔍 Search Mode | Regex pattern search across files |
+| 🔧 Refactor Mode | AI-assisted code refactoring |
+| ✨ Generate Mode | Spec generation (PO-friendly, Dev Specs, User Stories) |
+---
+## System Architecture
+```
+┌─────────────────────────────────────────────────────────────────────────────┐
+│                            CODE CRAWLER SYSTEM                               │
+├─────────────────────────────────────────────────────────────────────────────┤
+│                                                                              │
+│  ┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐        │
+│  │   DATA INGEST   │────▶│   PROCESSING    │────▶│    STORAGE      │        │
+│  │                 │     │                 │     │                 │        │
+│  │ • ZIP Files     │     │ • AST Parsing   │     │ • Vector DB     │        │
+│  │ • GitHub URLs   │     │ • Chunking      │     │   (Chroma/FAISS)│        │
+│  │ • Local Dirs    │     │ • Embeddings    │     │ • AST Graph     │        │
+│  │ • Web Docs      │     │ • Graph Build   │     │   (GraphML)     │        │
+│  └─────────────────┘     └─────────────────┘     └────────┬────────┘        │
+│                                                           │                  │
+│                                                           ▼                  │
+│  ┌─────────────────────────────────────────────────────────────────────┐    │
+│  │                        RETRIEVAL LAYER                               │    │
+│  │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐ │    │
+│  │  │   Vector    │  │    LLM      │  │   Graph     │  │  Reranker   │ │    │
+│  │  │  Retriever  │──│  Retriever  │──│  Enhanced   │──│  (Cross-    │ │    │
+│  │  │             │  │             │  │  Retriever  │  │   Encoder)  │ │    │
+│  │  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘ │    │
+│  └─────────────────────────────────────────────────────────────────────┘    │
+│                                    │                                         │
+│                                    ▼                                         │
+│  ┌─────────────────────────────────────────────────────────────────────┐    │
+│  │                         CHAT ENGINE                                  │    │
+│  │                                                                      │    │
+│  │   ┌──────────────────┐        ┌──────────────────────────┐          │    │
+│  │   │   Linear RAG     │   OR   │   Agentic Workflow       │          │    │
+│  │   │   (Simple)       │        │   (LangGraph)            │          │    │
+│  │   │                  │        │                          │          │    │
+│  │   │  Query → Retrieve│        │  Agent → Tool → Agent    │          │    │
+│  │   │      → Answer    │        │       ↓                  │          │    │
+│  │   │                  │        │  search_codebase         │          │    │
+│  │   │                  │        │  read_file               │          │    │
+│  │   │                  │        │  list_files              │          │    │
+│  │   │                  │        │  find_callers            │          │    │
+│  │   └──────────────────┘        └──────────────────────────┘          │    │
+│  └─────────────────────────────────────────────────────────────────────┘    │
+│                                    │                                         │
+│                                    ▼                                         │
+│  ┌─────────────────────────────────────────────────────────────────────┐    │
+│  │                       FRONTEND LAYER                                 │    │
+│  │                                                                      │    │
+│  │   Streamlit App          FastAPI (REST)         Next.js (React)     │    │
+│  │   ├── app.py             ├── /api/index         ├── /chat           │    │
+│  │   └── Code_Studio.py     ├── /api/chat          ├── /generate       │    │
+│  │                          └── /api/health        └── /search         │    │
+│  └─────────────────────────────────────────────────────────────────────┘    │
+└─────────────────────────────────────────────────────────────────────────────┘
+```
+---
+## Data Flow Pipeline
+### 1. Ingestion Flow
+```
+User Input (ZIP/GitHub/Local)
+         │
+         ▼
+┌─────────────────────────────────────────┐
+│      UniversalIngestor                  │
+│      (universal_ingestor.py)            │
+│                                         │
+│  ┌─────────────┐  ┌─────────────────┐   │
+│  │ _detect_    │  │ Handler Classes │   │
+│  │  handler()  │──▶│                 │   │
+│  └─────────────┘  │ • ZIPFileManager│   │
+│                   │ • GitHubRepoMgr │   │
+│                   │ • LocalDirMgr   │   │
+│                   │ • WebDocManager │   │
+│                   └─────────────────┘   │
+└────────────────────┬────────────────────┘
+                     │
+                     ▼
+            List[Document] + local_path
+```
+**Example: GitHub Repository Processing**
+```python
+# 1. User provides: "https://github.com/owner/repo"
+# 2. UniversalIngestor detects GitHub URL
+ingestor = UniversalIngestor(source)
+# delegate = GitHubRepoManager
+# 3. Download (clone or ZIP fallback)
+ingestor.download()
+# Clones to: /tmp/code_chatbot/owner_repo/
+# 4. Walk files
+for content, metadata in ingestor.walk():
+    # content = "def hello(): ..."
+    # metadata = {"file_path": "/tmp/.../main.py", "source": "main.py"}
+```
+### 2. Indexing Flow
+```
+Documents
+    │
+    ▼
+┌─────────────────────────────────────────────────────────────────┐
+│                        Indexer                                   │
+│                       (indexer.py)                               │
+│                                                                  │
+│  ┌─────────────────┐   ┌─────────────────┐   ┌───────────────┐  │
+│  │ StructuralChunker│──▶│ Embedding Model │──▶│  Vector Store │  │
+│  │                  │   │ (Gemini/HF)     │   │ (Chroma/FAISS)│  │
+│  └─────────────────┘   └─────────────────┘   └───────────────┘  │
+│                                                                  │
+│  Additionally:                                                   │
+│  ┌─────────────────┐   ┌─────────────────┐                      │
+│  │ ASTGraphBuilder │──▶│  GraphML File   │                      │
+│  └─────────────────┘   └─────────────────┘                      │
+└─────────────────────────────────────────────────────────────────┘
+```
+---
+## RAG Implementation
+The RAG system in this project is implemented in `code_chatbot/rag.py` with these key components:
+### ChatEngine Class
+```python
+class ChatEngine:
+    def __init__(self, retriever, model_name, provider, ...):
+        # 1. Base retriever (from vector store)
+        self.base_retriever = retriever
+        # 2. Enhanced retriever with reranking
+        self.vector_retriever = build_enhanced_retriever(
+            base_retriever=retriever,
+            use_multi_query=use_multi_query,
+            use_reranking=True  # Uses Cross-Encoder
+        )
+        # 3. LLM Retriever (file-aware)
+        self.llm_retriever = LLMRetriever(llm, repo_files)
+        # 4. Ensemble Retriever (combines both)
+        self.retriever = EnsembleRetriever(
+            retrievers=[self.vector_retriever, self.llm_retriever],
+            weights=[0.6, 0.4]  # 60% vector, 40% LLM
+        )
+```
+### RAG Flow Example
+```
+User Query: "How does the authentication work?"
+                    │
+                    ▼
+┌─────────────────────────────────────────────────────────────┐
+│ 1. RETRIEVAL                                                │
+│    ┌──────────────────┐      ┌──────────────────┐           │
+│    │ Vector Retriever │      │ LLM Retriever    │           │
+│    │                  │      │                  │           │
+│    │ Semantic search  │      │ LLM picks files  │           │
+│    │ in Chroma DB     │      │ from structure   │           │
+│    └────────┬─────────┘      └────────┬─────────┘           │
+│             │                         │                      │
+│             └────────────┬────────────┘                      │
+│                          ▼                                   │
+│              ┌─────────────────────┐                         │
+│              │ EnsembleRetriever   │                         │
+│              │ (60% + 40% weighted)│                         │
+│              └─────────┬───────────┘                         │
+│                        │                                     │
+│                        ▼                                     │
+│              ┌─────────────────────┐                         │
+│              │ Reranker            │                         │
+│              │ (Cross-Encoder)     │                         │
+│              │ ms-marco-MiniLM     │                         │
+│              └─────────┬───────────┘                         │
+│                        │                                     │
+│                        ▼                                     │
+│              Top 5 Most Relevant Docs                        │
+└─────────────────────────────────────────────────────────────┘
+                    │
+                    ▼
+┌──────────────────────────────────────────��──────────────────┐
+│ 2. GENERATION                                               │
+│                                                             │
+│    System Prompt + Context + History + Question             │
+│                        │                                    │
+│                        ▼                                    │
+│              ┌─────────────────────┐                        │
+│              │ LLM (Gemini/Groq)   │                        │
+│              └─────────┬───────────┘                        │
+│                        │                                    │
+│                        ▼                                    │
+│                   Answer + Sources                          │
+└─────────────────────────────────────────────────────────────┘
+```
+---
+## AST Analysis & Graph Creation
+The AST analysis is implemented in `code_chatbot/ast_analysis.py` using **tree-sitter** for multi-language parsing.
+### How AST Parsing Works
+```python
+# Example: Parsing a Python file
+# Source code:
+"""
+from typing import List
+class UserService:
+    def __init__(self, db):
+        self.db = db
+    def get_user(self, user_id: int) -> User:
+        return self.db.find(user_id)
+    def create_user(self, name: str) -> User:
+        user = User(name=name)
+        self.db.save(user)
+        return user
+"""
+# tree-sitter parses this into an AST:
+"""
+module
+├── import_from_statement
+│   ├── module: "typing"
+│   └── names: ["List"]
+├── class_definition
+│   ├── name: "UserService"
+│   └── block
+│       ├── function_definition (name: "__init__")
+│       ├── function_definition (name: "get_user")
+│       │   └── call (function: "self.db.find")
+│       └── function_definition (name: "create_user")
+│           ├── call (function: "User")
+│           └── call (function: "self.db.save")
+"""
+```
+### EnhancedCodeAnalyzer
+```python
+class EnhancedCodeAnalyzer:
+    """Builds a knowledge graph from code"""
+    def __init__(self):
+        self.graph = nx.DiGraph()  # NetworkX directed graph
+        self.functions = {}         # node_id -> FunctionInfo
+        self.classes = {}           # node_id -> ClassInfo
+        self.imports = {}           # file_path -> [ImportInfo]
+        self.definitions = {}       # name -> [node_ids]
+```
+### Graph Structure Example
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                    AST KNOWLEDGE GRAPH                          │
+├─────────────────────────────────────────────────────────────────┤
+│                                                                 │
+│  Nodes:                                                         │
+│  ┌──────────────────┐                                          │
+│  │ Type: "file"     │                                          │
+│  │ Name: "api.py"   │                                          │
+│  └────────┬─────────┘                                          │
+│           │ defines                                             │
+│           ▼                                                     │
+│  ┌──────────────────┐         ┌──────────────────┐             │
+│  │ Type: "class"    │         │ Type: "function" │             │
+│  │ Name: "UserAPI"  │         │ Name: "main"     │             │
+│  └────────┬─────────┘         └──────────────────┘             │
+│           │ has_method                                          │
+│           ▼                                                     │
+│  ┌──────────────────┐                                          │
+│  │ Type: "method"   │───calls───▶ UserService.get_user         │
+│  │ Name: "get"      │                                          │
+│  └──────────────────┘                                          │
+│                                                                 │
+│  Edges:                                                         │
+│  • defines: file -> class/function                              │
+│  • has_method: class -> method                                  │
+│  • calls: function -> function                                  │
+│  • imports: file -> module                                      │
+│  • inherits_from: class -> class                                │
+└─────────────────────────────────────────────────────────────────┘
+```
+### Call Graph Resolution
+```python
+def resolve_call_graph(self):
+    """
+    After parsing all files, resolve function calls to their definitions.
+    Example:
+    - File A has: service.get_user(id)
+    - File B has: def get_user(self, id): ...
+    Resolution:
+    - Finds that "get_user" is defined in File B
+    - Creates edge: A::caller_func --calls--> B::UserService.get_user
+    """
+    for caller_id, callee_name, line in self.unresolved_calls:
+        # Try direct match
+        if callee_name in self.definitions:
+            for target_id in self.definitions[callee_name]:
+                self.graph.add_edge(caller_id, target_id, relation="calls")
+```
+---
+## Code Chunking Strategy
+The chunking system in `code_chatbot/chunker.py` uses **structural chunking** based on AST boundaries.
+### Chunking Philosophy
+```
+Traditional Text Chunking:
+┌─────────────────────────────────────────┐
+│ def process_data():        │ CHUNK 1    │
+│     data = load()          │            │
+│     # Some processing      │            │
+│ ───────────────────────────┼────────────│
+│     result = transform()   │ CHUNK 2    │  ← Breaks mid-function!
+│     return result          │            │
+└─────────────────────────────────────────┘
+Structural Chunking (This Project):
+┌─────────────────────────────────────────┐
+│ def process_data():        │            │
+│     data = load()          │ CHUNK 1    │  ← Complete function
+│     result = transform()   │            │
+│     return result          │            │
+├─────────────────────────────────────────┤
+│ def another_function():    │            │
+│     ...                    │ CHUNK 2    │  ← Complete function
+└─────────────────────────────────────────┘
+```
+### StructuralChunker Implementation
+```python
+class StructuralChunker:
+    """Uses tree-sitter to chunk code at semantic boundaries"""
+    def __init__(self, max_tokens: int = 800):
+        self.max_tokens = max_tokens
+        self._init_parsers()  # Python, JS, TS parsers
+    def _chunk_node(self, node, file_content, file_metadata):
+        """
+        Recursive chunking algorithm:
+        1. If node fits in max_tokens → return as single chunk
+        2. If node is too large → recurse into children
+        3. Merge neighboring small chunks
+        """
+        chunk = FileChunk(file_content, file_metadata,
+                         node.start_byte, node.end_byte)
+        # Fits? Return it
+        if chunk.num_tokens <= self.max_tokens:
+            return [chunk]
+        # Too large? Recurse
+        child_chunks = []
+        for child in node.children:
+            child_chunks.extend(self._chunk_node(child, ...))
+        # Merge small neighbors
+        return self._merge_small_chunks(child_chunks)
+```
+### Chunk Metadata (Rich Context)
+Each chunk carries rich metadata:
+```python
+@dataclass
+class FileChunk:
+    file_content: str
+    file_metadata: Dict
+    start_byte: int
+    end_byte: int
+    # Enhanced metadata
+    symbols_defined: List[str]    # ["UserService", "UserService.get_user"]
+    imports_used: List[str]       # ["from typing import List"]
+    complexity_score: int         # Cyclomatic complexity
+    parent_context: str           # "UserService" (parent class)
+```
+This metadata is stored in the vector DB and used for filtering/ranking.
+---
+## Retrieval System
+### Multi-Stage Retrieval Pipeline
+```
+Query: "How does user authentication work?"
+                    │
+                    ▼
+┌───────────────────────────────────────────────────────────────┐
+│  STAGE 1: Initial Retrieval (k=10)                            │
+│                                                               │
+│  ┌─────────────────────────────────────────────────────────┐  │
+│  │               Vector Store (Chroma)                      │  │
+│  │                                                          │  │
+│  │  Query Embedding ──similarity──▶ Document Embeddings     │  │
+│  │                                                          │  │
+│  │  Returns: 10 candidate documents                         │  │
+│  └─────────────────────────────────────────────────────────┘  │
+└───────────────────────────────────────────────────────────────┘
+                    │
+                    ▼
+┌───────────────────────────────────────────────────────────────┐
+│  STAGE 2: LLM-Based File Selection                            │
+│                                                               │
+│  ┌─────────────────────────────────────────────────────────┐  │
+│  │              LLMRetriever                                │  │
+│  │                                                          │  │
+│  │  File Tree:                                              │  │
+│  │  ├── src/                                                │  │
+│  │  │   ├── auth/                                           │  │
+│  │  │   │   ├── login.py      ◄── LLM selects this         │  │
+│  │  │   │   └── middleware.py ◄── And this                 │  │
+│  │  │   └── api/                                            │  │
+│  │  └── tests/                                              │  │
+│  │                                                          │  │
+│  │  LLM Prompt: "Select top 5 relevant files for: ..."      │  │
+│  └─────────────────────────────────────────────────────────┘  │
+└───────────────────────────────────────────────────────────────┘
+                    │
+                    ▼
+┌───────────────────────────────────────────────────────────────┐
+│  STAGE 3: Ensemble Combination                                │
+│                                                               │
+│  Vector Results (weight: 0.6) + LLM Results (weight: 0.4)     │
+│                                                               │
+│  Combined: 12-15 unique documents                             │
+└───────────────────────────────────────────────────────────────┘
+                    │
+                    ▼
+┌───────────────────────────────────────────────────────────────┐
+│  STAGE 4: Graph Enhancement                                   │
+│                                                               │
+│  For each retrieved document:                                 │
+│  1. Find its node in AST graph                                │
+│  2. Get neighboring nodes (related files)                     │
+│  3. Add related files to context                              │
+│                                                               │
+│  Example: login.py found → adds auth_utils.py (imports it)    │
+└───────────────────────────────────────────────────────────────┘
+                    │
+                    ▼
+┌───────────────────────────────────────────────────────────────┐
+│  STAGE 5: Reranking                                           │
+│                                                               │
+│  ┌───────────────────────────────────────────────��─────────┐  │
+│  │              Cross-Encoder Reranker                      │  │
+│  │              (ms-marco-MiniLM-L-6-v2)                     │  │
+│  │                                                          │  │
+│  │  For each (query, document) pair:                        │  │
+│  │  score = cross_encoder.predict([query, doc.content])     │  │
+│  │                                                          │  │
+│  │  Sort by score, return top 5                             │  │
+│  └─────────────────────────────────────────────────────────┘  │
+└───────────────────────────────────────────────────────────────┘
+                    │
+                    ▼
+            Final: Top 5 Documents
+```
+### Reranker (Cross-Encoder)
+```python
+class Reranker:
+    """
+    Uses a Cross-Encoder for precise relevance scoring.
+    Unlike bi-encoders (used for initial retrieval), cross-encoders
+    process query AND document together, giving more accurate scores.
+    """
+    def __init__(self, model_name="cross-encoder/ms-marco-MiniLM-L-6-v2"):
+        self.model = CrossEncoder(model_name)
+    def rerank(self, query: str, documents: List[Document], top_k=5):
+        # Score each document against the query
+        pairs = [[query, doc.page_content] for doc in documents]
+        scores = self.model.predict(pairs)
+        # Sort by score
+        scored = sorted(zip(documents, scores), key=lambda x: x[1], reverse=True)
+        return [doc for doc, _ in scored[:top_k]]
+```
+---
+## Agentic Workflow
+The agentic workflow uses **LangGraph** to enable multi-step reasoning with tool use.
+### Agent Graph Structure
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                    LANGGRAPH AGENT                              │
+├─────────────────────────────────────────────────────────────────┤
+│                                                                 │
+│                    ┌─────────────┐                              │
+│         ┌─────────│   START     │─────────┐                     │
+│         │         └─────────────┘         │                     │
+│         ▼                                 │                     │
+│  ┌─────────────────────────────────────┐  │                     │
+│  │           AGENT NODE                │  │                     │
+│  │                                     │  │                     │
+│  │  1. Process messages                │  │                     │
+│  │  2. Call LLM with tools bound       │  │                     │
+│  │  3. LLM decides:                    │  │                     │
+│  │     - Call a tool? → go to TOOLS    │  │                     │
+│  │     - Final answer? → go to END     │  │                     │
+│  └──────────────┬──────────────────────┘  │                     │
+│                 │                         │                     │
+│       has_tool_call?                      │                     │
+│         │     │                           │                     │
+│    Yes  │     │  No                       │                     │
+│         │     │                           │                     │
+│         ▼     └──────────────────────────▶┤                     │
+│  ┌─────────────────────────────────────┐  │                     │
+│  │           TOOLS NODE                │  │                     │
+│  │                                     │  │                     │
+│  │  Execute tool calls:                │  │                     │
+│  │  • search_codebase(query)           │  │                     │
+│  │  • read_file(path)                  │  │                     │
+│  │  • list_files(dir)                  │  │                     │
+│  │  • find_callers(func)               │  │                     │
+│  │  • find_callees(func)               │  │                     │
+│  │  • find_call_chain(a, b)            │  │                     │
+│  │                                     │  │                     │
+│  │  Add tool results to messages       │  │                     │
+│  └──────────────┬──────────────────────┘  │                     │
+│                 │                         │                     │
+│                 └─────────────────────────┘                     │
+│                                                                 │
+│                         ▼                                       │
+│                  ┌─────────────┐                                │
+│                  │     END     │                                │
+│                  └─────────────┘                                │
+└─────────────────────────────────────────────────────────────────┘
+```
+### Available Tools
+```python
+# 1. search_codebase - Semantic search in vector store
+@tool("search_codebase")
+def search_codebase(query: str):
+    """Search the codebase for relevant code snippets."""
+    docs = retriever.invoke(query)
+    return format_results(docs[:5])
+# 2. read_file - Read complete file content
+@tool("read_file")
+def read_file(file_path: str):
+    """Read the content of a specific file."""
+    with open(full_path, "r") as f:
+        return f.read()
+# 3. list_files - Directory listing
+@tool("list_files")
+def list_files(path: str = "."):
+    """List files in a directory."""
+    return "\n".join(os.listdir(target_path))
+# 4. find_callers - Call graph: who calls this?
+@tool("find_callers")
+def find_callers(function_name: str):
+    """Find all functions that call this function."""
+    return analyzer.get_callers(function_name)
+# 5. find_callees - Call graph: what does this call?
+@tool("find_callees")
+def find_callees(function_name: str):
+    """Find all functions called by this function."""
+    return analyzer.get_callees(function_name)
+# 6. find_call_chain - Call graph: trace execution path
+@tool("find_call_chain")
+def find_call_chain(start: str, end: str):
+    """Find the call path from one function to another."""
+    return analyzer.get_call_chain(start, end)
+```
+### Agent Execution Example
+```
+User: "How does the login function handle invalid passwords?"
+Agent Thought: I need to find the login function first.
+Agent Action: search_codebase("login function authentication")
+Tool Result: [login.py snippet showing authenticate() function]
+Agent Thought: I found authenticate(). Let me see the full implementation.
+Agent Action: read_file("src/auth/login.py")
+Tool Result: [Full file content]
+Agent Thought: I see it calls verify_password(). Let me check that.
+Agent Action: find_callees("authenticate")
+Tool Result: verify_password, get_user, create_session
+Agent Action: search_codebase("verify_password invalid password")
+Tool Result: [password_utils.py with error handling]
+Agent Final Answer: The login function handles invalid passwords by...
+```
+---
+## Frontend & API
+### Streamlit App Structure
+```
+app.py (Main Entry)
+    │
+    ├── Ingestion Screen
+    │   ├── Source Type Selection (ZIP/GitHub/Web)
+    │   ├── File Upload / URL Input
+    │   └── "Process & Index" Button
+    │
+    └── Redirects to → pages/1_⚡_Code_Studio.py
+Code_Studio.py
+    │
+    ├── Left Panel (Tabs)
+    │   ├── 📁 Explorer - File tree navigation
+    │   ├── 🔍 Search - Regex pattern search
+    │   ├── 💬 Chat - RAG conversation
+    │   └── ✨ Generate - Spec generation
+    │
+    └── Right Panel
+        └── Code Viewer - Syntax highlighted file view
+```
+### FastAPI REST API
+```
+/api
+  ├── /health     GET   - Health check
+  │
+  ├── /index      POST  - Index a codebase
+  │   Body: {
+  │     source: "https://github.com/...",
+  │     provider: "gemini",
+  │     use_agent: true
+  │   }
+  │
+  └── /chat       POST  - Ask questions
+      Body: {
+        question: "How does auth work?",
+        provider: "gemini",
+        use_agent: true
+      }
+      Response: {
+        answer: "...",
+        sources: [...],
+        mode: "agent",
+        processing_time: 2.5
+      }
+```
+---
+## Component Deep Dives
+### Merkle Tree (Incremental Indexing)
+```python
+class MerkleTree:
+    """
+    Enables incremental indexing by detecting file changes.
+    How it works:
+    1. Build a hash tree mirroring directory structure
+    2. Each file node has SHA-256 hash of content
+    3. Each directory node has hash of children hashes
+    4. Compare old vs new tree to find changes
+    """
+    def compare_trees(self, old, new) -> ChangeSet:
+        # Returns: added, modified, deleted, unchanged files
+```
+**Example:**
+```
+First Index:
+  project/
+  ├── main.py    (hash: abc123)
+  └── utils.py   (hash: def456)
+  Root hash: sha256(abc123 + def456) = xyz789
+Second Index (utils.py changed):
+  project/
+  ├── main.py    (hash: abc123)  ← unchanged
+  └── utils.py   (hash: ghi012)  ← NEW HASH!
+  Root hash changed! → Only re-index utils.py
+```
+### Path Obfuscation (Privacy)
+```python
+class PathObfuscator:
+    """
+    Obfuscates file paths for sensitive codebases.
+    Original: /home/user/secret-project/src/auth/login.py
+    Obfuscated: /f8a3b2c1/d4e5f6a7/89012345.py
+    Mapping stored securely, reversible only with key.
+    """
+```
+### Rate Limiter (API Management)
+```python
+class AdaptiveRateLimiter:
+    """
+    Handles rate limits for free-tier APIs.
+    Gemini Free Tier: 15 RPM, 32K TPM, 1500 RPD
+    Strategies:
+    1. Track usage in rolling window
+    2. Adaptive delay based on remaining quota
+    3. Exponential backoff on 429 errors
+    4. Model fallback chain (flash → pro → legacy)
+    """
+```
+---
+## Configuration System
+```python
+@dataclass
+class RAGConfig:
+    """Central configuration for entire pipeline"""
+    # Chunking
+    chunking: ChunkingConfig
+        max_chunk_tokens: int = 800
+        min_chunk_tokens: int = 100
+        preserve_imports: bool = True
+        calculate_complexity: bool = True
+    # Privacy
+    privacy: PrivacyConfig
+        enable_path_obfuscation: bool = False
+    # Indexing
+    indexing: IndexingConfig
+        enable_incremental_indexing: bool = True
+        batch_size: int = 100
+        ignore_patterns: List[str] = [...]
+    # Retrieval
+    retrieval: RetrievalConfig
+        enable_reranking: bool = True
+        retrieval_k: int = 10
+        rerank_top_k: int = 5
+        similarity_threshold: float = 0.5
+```
+---
+## File Dependency Map
+```
+app.py
+├── code_chatbot/universal_ingestor.py
+├── code_chatbot/indexer.py
+│   ├── code_chatbot/chunker.py (StructuralChunker)
+│   ├── code_chatbot/merkle_tree.py (MerkleTree)
+│   ├── code_chatbot/config.py (RAGConfig)
+│   └── code_chatbot/db_connection.py (Chroma client)
+├── code_chatbot/rag.py (ChatEngine)
+│   ├── code_chatbot/retriever_wrapper.py
+│   │   └── code_chatbot/reranker.py (Reranker)
+│   ├── code_chatbot/llm_retriever.py (LLMRetriever)
+│   ├── code_chatbot/agent_workflow.py
+│   │   └── code_chatbot/tools.py
+│   └── code_chatbot/prompts.py
+├── code_chatbot/ast_analysis.py (EnhancedCodeAnalyzer)
+└── code_chatbot/graph_rag.py (GraphEnhancedRetriever)
+pages/1_⚡_Code_Studio.py
+├── components/file_explorer.py
+├── components/code_viewer.py
+├── components/panels.py
+└── components/style.py
+api/main.py
+├── api/routes/chat.py
+├── api/routes/index.py
+├── api/routes/health.py
+├── api/schemas.py
+└── api/state.py
+```
+---
+## Summary
+This project implements a sophisticated code understanding system with:
+1. **Multi-Source Ingestion**: ZIP, GitHub, Local, Web
+2. **Structural Chunking**: AST-aware code splitting
+3. **Hybrid Retrieval**: Vector + LLM + Graph-enhanced
+4. **Cross-Encoder Reranking**: Precision at the top
+5. **Agentic Workflow**: Multi-step reasoning with tools
+6. **Call Graph Analysis**: Function relationship tracking
+7. **Incremental Indexing**: Merkle tree change detection
+8. **Multi-LLM Support**: Gemini, Groq with fallbacks
+The architecture is designed for scalability, accuracy, and developer experience.

CODE_OF_CONDUCT.md DELETED Viewed

@@ -1,128 +0,0 @@
-# Contributor Covenant Code of Conduct
-## Our Pledge
-We as members, contributors, and leaders pledge to make participation in our
-community a harassment-free experience for everyone, regardless of age, body
-size, visible or invisible disability, ethnicity, sex characteristics, gender
-identity and expression, level of experience, education, socio-economic status,
-nationality, personal appearance, race, religion, or sexual identity
-and orientation.
-We pledge to act and interact in ways that contribute to an open, welcoming,
-diverse, inclusive, and healthy community.
-## Our Standards
-Examples of behavior that contributes to a positive environment for our
-community include:
-* Demonstrating empathy and kindness toward other people
-* Being respectful of differing opinions, viewpoints, and experiences
-* Giving and gracefully accepting constructive feedback
-* Accepting responsibility and apologizing to those affected by our mistakes,
-  and learning from the experience
-* Focusing on what is best not just for us as individuals, but for the
-  overall community
-Examples of unacceptable behavior include:
-* The use of sexualized language or imagery, and sexual attention or
-  advances of any kind
-* Trolling, insulting or derogatory comments, and personal or political attacks
-* Public or private harassment
-* Publishing others' private information, such as a physical or email
-  address, without their explicit permission
-* Other conduct which could reasonably be considered inappropriate in a
-  professional setting
-## Enforcement Responsibilities
-Community leaders are responsible for clarifying and enforcing our standards of
-acceptable behavior and will take appropriate and fair corrective action in
-response to any behavior that they deem inappropriate, threatening, offensive,
-or harmful.
-Community leaders have the right and responsibility to remove, edit, or reject
-comments, commits, code, wiki edits, issues, and other contributions that are
-not aligned to this Code of Conduct, and will communicate reasons for moderation
-decisions when appropriate.
-## Scope
-This Code of Conduct applies within all community spaces, and also applies when
-an individual is officially representing the community in public spaces.
-Examples of representing our community include using an official e-mail address,
-posting via an official social media account, or acting as an appointed
-representative at an online or offline event.
-## Enforcement
-Instances of abusive, harassing, or otherwise unacceptable behavior may be
-reported to the community leaders responsible for enforcement at
-reported to the community leaders responsible for enforcement.
-All complaints will be reviewed and investigated promptly and fairly.
-All community leaders are obligated to respect the privacy and security of the
-reporter of any incident.
-## Enforcement Guidelines
-Community leaders will follow these Community Impact Guidelines in determining
-the consequences for any action they deem in violation of this Code of Conduct:
-### 1. Correction
-**Community Impact**: Use of inappropriate language or other behavior deemed
-unprofessional or unwelcome in the community.
-**Consequence**: A private, written warning from community leaders, providing
-clarity around the nature of the violation and an explanation of why the
-behavior was inappropriate. A public apology may be requested.
-### 2. Warning
-**Community Impact**: A violation through a single incident or series
-of actions.
-**Consequence**: A warning with consequences for continued behavior. No
-interaction with the people involved, including unsolicited interaction with
-those enforcing the Code of Conduct, for a specified period of time. This
-includes avoiding interactions in community spaces as well as external channels
-like social media. Violating these terms may lead to a temporary or
-permanent ban.
-### 3. Temporary Ban
-**Community Impact**: A serious violation of community standards, including
-sustained inappropriate behavior.
-**Consequence**: A temporary ban from any sort of interaction or public
-communication with the community for a specified period of time. No public or
-private interaction with the people involved, including unsolicited interaction
-with those enforcing the Code of Conduct, is allowed during this period.
-Violating these terms may lead to a permanent ban.
-### 4. Permanent Ban
-**Community Impact**: Demonstrating a pattern of violation of community
-standards, including sustained inappropriate behavior,  harassment of an
-individual, or aggression toward or disparagement of classes of individuals.
-**Consequence**: A permanent ban from any sort of public interaction within
-the community.
-## Attribution
-This Code of Conduct is adapted from the [Contributor Covenant][homepage],
-version 2.0, available at
-https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
-Community Impact Guidelines were inspired by [Mozilla's code of conduct
-enforcement ladder](https://github.com/mozilla/diversity).
-[homepage]: https://www.contributor-covenant.org
-For answers to common questions about this code of conduct, see the FAQ at
-https://www.contributor-covenant.org/faq. Translations are available at
-https://www.contributor-covenant.org/translations.

api/routes/index.py CHANGED Viewed

@@ -24,12 +24,12 @@ async def index_codebase(request: IndexRequest):
     try:
         # Import required modules
-        from code_chatbot.universal_ingestor import process_source
-        from code_chatbot.ast_analysis import ASTGraphBuilder
-        from code_chatbot.indexer import Indexer
-        from code_chatbot.graph_rag import GraphEnhancedRetriever
-        from code_chatbot.rag import ChatEngine
-        from code_chatbot.chunker import StructuralChunker
         from langchain_community.vectorstores import Chroma, FAISS
         from langchain_community.vectorstores.utils import filter_complex_metadata

     try:
         # Import required modules
+        from code_chatbot.ingestion.universal_ingestor import process_source
+        from code_chatbot.analysis.ast_analysis import ASTGraphBuilder
+        from code_chatbot.ingestion.indexer import Indexer
+        from code_chatbot.retrieval.graph_rag import GraphEnhancedRetriever
+        from code_chatbot.retrieval.rag import ChatEngine
+        from code_chatbot.ingestion.chunker import StructuralChunker
         from langchain_community.vectorstores import Chroma, FAISS
         from langchain_community.vectorstores.utils import filter_complex_metadata

app.py CHANGED Viewed

@@ -2,11 +2,11 @@ import streamlit as st
 import os
 import shutil
 import time
-from code_chatbot.universal_ingestor import process_source
-from code_chatbot.indexer import Indexer
-from code_chatbot.rag import ChatEngine
-from code_chatbot.ast_analysis import ASTGraphBuilder
-from code_chatbot.graph_rag import GraphEnhancedRetriever
 import logging
 from dotenv import load_dotenv
@@ -83,7 +83,7 @@ if not st.session_state.processed_files:
                  st.error(f"Please configure {embedding_provider} API Key for embeddings in the sidebar.")
             else:
                 # Use the new progress-tracked indexer
-                from code_chatbot.indexing_progress import index_with_progress
                 chat_engine, success, repo_files, workspace_root = index_with_progress(
                     source_input=source_input,

 import os
 import shutil
 import time
+from code_chatbot.ingestion.universal_ingestor import process_source
+from code_chatbot.ingestion.indexer import Indexer
+from code_chatbot.retrieval.rag import ChatEngine
+from code_chatbot.analysis.ast_analysis import ASTGraphBuilder
+from code_chatbot.retrieval.graph_rag import GraphEnhancedRetriever
 import logging
 from dotenv import load_dotenv
                  st.error(f"Please configure {embedding_provider} API Key for embeddings in the sidebar.")
             else:
                 # Use the new progress-tracked indexer
+                from code_chatbot.ingestion.indexing_progress import index_with_progress
                 chat_engine, success, repo_files, workspace_root = index_with_progress(
                     source_input=source_input,

architecture_viz.jsx ADDED Viewed

	@@ -0,0 +1,625 @@

+import React, { useState } from 'react';
+import { ChevronRight, ChevronDown, Database, Code, Brain, Search, FileText, GitBranch, Layers, Workflow, Server, Cpu, ArrowRight, Zap } from 'lucide-react';
+const ArchitectureViz = () => {
+  const [activeTab, setActiveTab] = useState('overview');
+  const [expandedSections, setExpandedSections] = useState({});
+  const toggleSection = (section) => {
+    setExpandedSections(prev => ({
+      ...prev,
+      [section]: !prev[section]
+    }));
+  };
+  const tabs = [
+    { id: 'overview', label: 'System Overview', icon: Layers },
+    { id: 'rag', label: 'RAG Pipeline', icon: Search },
+    { id: 'ast', label: 'AST & Graphs', icon: GitBranch },
+    { id: 'chunking', label: 'Code Chunking', icon: Code },
+    { id: 'agent', label: 'Agentic Workflow', icon: Brain },
+    { id: 'retrieval', label: 'Retrieval System', icon: Database },
+  ];
+  const ComponentCard = ({ title, description, icon: Icon, color, children }) => (
+    <div className={`bg-slate-800 rounded-lg p-4 border-l-4 ${color} hover:bg-slate-750 transition-all`}>
+      <div className="flex items-center gap-2 mb-2">
+        <Icon className="w-5 h-5 text-slate-300" />
+        <h3 className="font-semibold text-white">{title}</h3>
+      </div>
+      <p className="text-slate-400 text-sm mb-2">{description}</p>
+      {children}
+    </div>
+  );
+  const FlowArrow = () => (
+    <div className="flex justify-center py-2">
+      <ArrowRight className="w-6 h-6 text-slate-500" />
+    </div>
+  );
+  const renderOverview = () => (
+    <div className="space-y-6">
+      <div className="bg-gradient-to-r from-purple-900/50 to-blue-900/50 rounded-xl p-6 border border-purple-500/30">
+        <h2 className="text-2xl font-bold text-white mb-2 flex items-center gap-2">
+          <Zap className="w-6 h-6 text-yellow-400" />
+          Code Crawler Architecture
+        </h2>
+        <p className="text-slate-300">
+          An AI-powered codebase assistant combining RAG, AST analysis, Graph databases, and Agentic workflows.
+        </p>
+      </div>
+      <div className="grid grid-cols-1 md:grid-cols-3 gap-4">
+        <ComponentCard
+          title="Data Ingestion"
+          description="Universal ingestor supporting ZIP, GitHub, Local, Web"
+          icon={FileText}
+          color="border-green-500"
+        >
+          <div className="mt-2 space-y-1">
+            <div className="text-xs bg-slate-700 rounded px-2 py-1">ZIPFileManager</div>
+            <div className="text-xs bg-slate-700 rounded px-2 py-1">GitHubRepoManager</div>
+            <div className="text-xs bg-slate-700 rounded px-2 py-1">LocalDirectoryManager</div>
+            <div className="text-xs bg-slate-700 rounded px-2 py-1">WebDocManager</div>
+          </div>
+        </ComponentCard>
+        <ComponentCard
+          title="Processing"
+          description="AST parsing, chunking, embeddings, graph building"
+          icon={Cpu}
+          color="border-blue-500"
+        >
+          <div className="mt-2 space-y-1">
+            <div className="text-xs bg-slate-700 rounded px-2 py-1">StructuralChunker (tree-sitter)</div>
+            <div className="text-xs bg-slate-700 rounded px-2 py-1">EnhancedCodeAnalyzer</div>
+            <div className="text-xs bg-slate-700 rounded px-2 py-1">Gemini/HuggingFace Embeddings</div>
+          </div>
+        </ComponentCard>
+        <ComponentCard
+          title="Storage"
+          description="Vector DB and AST knowledge graph"
+          icon={Database}
+          color="border-purple-500"
+        >
+          <div className="mt-2 space-y-1">
+            <div className="text-xs bg-slate-700 rounded px-2 py-1">Chroma / FAISS / Qdrant</div>
+            <div className="text-xs bg-slate-700 rounded px-2 py-1">GraphML (NetworkX)</div>
+            <div className="text-xs bg-slate-700 rounded px-2 py-1">Merkle Tree Snapshots</div>
+          </div>
+        </ComponentCard>
+      </div>
+      <div className="bg-slate-800 rounded-lg p-4">
+        <h3 className="font-semibold text-white mb-3 flex items-center gap-2">
+          <Workflow className="w-5 h-5" />
+          Data Flow
+        </h3>
+        <div className="flex flex-wrap items-center justify-center gap-2 text-sm">
+          <span className="bg-green-600/30 text-green-300 px-3 py-1 rounded-full">Input Source</span>
+          <ArrowRight className="w-4 h-4 text-slate-500" />
+          <span className="bg-blue-600/30 text-blue-300 px-3 py-1 rounded-full">Ingestor</span>
+          <ArrowRight className="w-4 h-4 text-slate-500" />
+          <span className="bg-purple-600/30 text-purple-300 px-3 py-1 rounded-full">Chunker</span>
+          <ArrowRight className="w-4 h-4 text-slate-500" />
+          <span className="bg-pink-600/30 text-pink-300 px-3 py-1 rounded-full">Embeddings</span>
+          <ArrowRight className="w-4 h-4 text-slate-500" />
+          <span className="bg-orange-600/30 text-orange-300 px-3 py-1 rounded-full">Vector DB</span>
+        </div>
+      </div>
+      <div className="grid grid-cols-1 md:grid-cols-2 gap-4">
+        <ComponentCard
+          title="Retrieval Layer"
+          description="Multi-stage retrieval with reranking"
+          icon={Search}
+          color="border-yellow-500"
+        >
+          <div className="mt-2 text-xs space-y-1">
+            <div className="flex items-center gap-2">
+              <span className="w-2 h-2 bg-yellow-500 rounded-full"></span>
+              <span className="text-slate-300">Vector Retriever (60%)</span>
+            </div>
+            <div className="flex items-center gap-2">
+              <span className="w-2 h-2 bg-yellow-500 rounded-full"></span>
+              <span className="text-slate-300">LLM Retriever (40%)</span>
+            </div>
+            <div className="flex items-center gap-2">
+              <span className="w-2 h-2 bg-yellow-500 rounded-full"></span>
+              <span className="text-slate-300">Graph Enhancement</span>
+            </div>
+            <div className="flex items-center gap-2">
+              <span className="w-2 h-2 bg-yellow-500 rounded-full"></span>
+              <span className="text-slate-300">Cross-Encoder Reranker</span>
+            </div>
+          </div>
+        </ComponentCard>
+        <ComponentCard
+          title="Chat Engine"
+          description="Dual-mode: Linear RAG or Agentic"
+          icon={Brain}
+          color="border-red-500"
+        >
+          <div className="mt-2 text-xs space-y-1">
+            <div className="flex items-center gap-2">
+              <span className="w-2 h-2 bg-red-500 rounded-full"></span>
+              <span className="text-slate-300">Linear RAG (simple Q&A)</span>
+            </div>
+            <div className="flex items-center gap-2">
+              <span className="w-2 h-2 bg-red-500 rounded-full"></span>
+              <span className="text-slate-300">Agentic Workflow (LangGraph)</span>
+            </div>
+            <div className="flex items-center gap-2">
+              <span className="w-2 h-2 bg-red-500 rounded-full"></span>
+              <span className="text-slate-300">Tools: search, read, list, call_graph</span>
+            </div>
+          </div>
+        </ComponentCard>
+      </div>
+    </div>
+  );
+  const renderRAG = () => (
+    <div className="space-y-6">
+      <div className="bg-slate-800 rounded-xl p-6">
+        <h2 className="text-xl font-bold text-white mb-4">RAG Pipeline Implementation</h2>
+        <p className="text-slate-400 mb-4">
+          The RAG (Retrieval-Augmented Generation) system combines vector search with LLM-based file selection
+          and cross-encoder reranking for high-precision code retrieval.
+        </p>
+        <div className="space-y-4">
+          <div className="bg-slate-700/50 rounded-lg p-4">
+            <h3 className="font-semibold text-green-400 mb-2">1. Query Processing</h3>
+            <code className="text-sm text-slate-300 block bg-slate-900 p-3 rounded">
+              {`query = "How does authentication work?"
+# Optionally expand with multi-query
+expanded_queries = multi_query_expander(query)`}
+            </code>
+          </div>
+          <div className="bg-slate-700/50 rounded-lg p-4">
+            <h3 className="font-semibold text-blue-400 mb-2">2. Hybrid Retrieval</h3>
+            <code className="text-sm text-slate-300 block bg-slate-900 p-3 rounded">
+              {`# Vector similarity search (60% weight)
+vector_docs = chroma_db.similarity_search(query, k=10)
+# LLM-based file selection (40% weight)
+llm_docs = llm_retriever.select_files(query, file_tree)
+# Combine with EnsembleRetriever
+combined = ensemble([vector_docs, llm_docs], weights=[0.6, 0.4])`}
+            </code>
+          </div>
+          <div className="bg-slate-700/50 rounded-lg p-4">
+            <h3 className="font-semibold text-purple-400 mb-2">3. Graph Enhancement</h3>
+            <code className="text-sm text-slate-300 block bg-slate-900 p-3 rounded">
+              {`# For each retrieved doc, find related files via AST graph
+for doc in combined:
+    neighbors = ast_graph.neighbors(doc.file_path)
+    for neighbor in neighbors:
+        if relation == "imports" or relation == "calls":
+            augmented_docs.append(read_file(neighbor))`}
+            </code>
+          </div>
+          <div className="bg-slate-700/50 rounded-lg p-4">
+            <h3 className="font-semibold text-yellow-400 mb-2">4. Cross-Encoder Reranking</h3>
+            <code className="text-sm text-slate-300 block bg-slate-900 p-3 rounded">
+              {`# Score each (query, document) pair with cross-encoder
+pairs = [[query, doc.content] for doc in augmented_docs]
+scores = cross_encoder.predict(pairs)
+# Return top 5 by score
+final_docs = sorted(zip(docs, scores), by=score)[:5]`}
+            </code>
+          </div>
+          <div className="bg-slate-700/50 rounded-lg p-4">
+            <h3 className="font-semibold text-red-400 mb-2">5. Generation</h3>
+            <code className="text-sm text-slate-300 block bg-slate-900 p-3 rounded">
+              {`# Build context from retrieved docs
+context = format_docs(final_docs)
+# Generate answer with LLM
+prompt = system_prompt.format(context=context)
+answer = llm.invoke([SystemMessage(prompt), HumanMessage(query)])`}
+            </code>
+          </div>
+        </div>
+      </div>
+      <div className="bg-slate-800 rounded-xl p-6">
+        <h3 className="font-semibold text-white mb-3">Key Files</h3>
+        <div className="grid grid-cols-2 gap-2 text-sm">
+          <div className="bg-slate-700 rounded p-2">
+            <span className="text-blue-400">code_chatbot/rag.py</span>
+            <p className="text-slate-400 text-xs">ChatEngine class</p>
+          </div>
+          <div className="bg-slate-700 rounded p-2">
+            <span className="text-blue-400">code_chatbot/retriever_wrapper.py</span>
+            <p className="text-slate-400 text-xs">RerankingRetriever</p>
+          </div>
+          <div className="bg-slate-700 rounded p-2">
+            <span className="text-blue-400">code_chatbot/llm_retriever.py</span>
+            <p className="text-slate-400 text-xs">LLM-based file selection</p>
+          </div>
+          <div className="bg-slate-700 rounded p-2">
+            <span className="text-blue-400">code_chatbot/reranker.py</span>
+            <p className="text-slate-400 text-xs">Cross-encoder reranking</p>
+          </div>
+        </div>
+      </div>
+    </div>
+  );
+  const renderAST = () => (
+    <div className="space-y-6">
+      <div className="bg-slate-800 rounded-xl p-6">
+        <h2 className="text-xl font-bold text-white mb-4">AST Analysis & Knowledge Graph</h2>
+        <p className="text-slate-400 mb-4">
+          Uses <span className="text-green-400">tree-sitter</span> to parse code into Abstract Syntax Trees,
+          then builds a <span className="text-blue-400">NetworkX</span> directed graph capturing code relationships.
+        </p>
+        <div className="grid grid-cols-1 md:grid-cols-2 gap-4 mb-6">
+          <div className="bg-slate-700/50 rounded-lg p-4">
+            <h3 className="font-semibold text-purple-400 mb-2">Node Types</h3>
+            <ul className="text-sm text-slate-300 space-y-1">
+              <li className="flex items-center gap-2">
+                <span className="w-3 h-3 bg-green-500 rounded"></span> file
+              </li>
+              <li className="flex items-center gap-2">
+                <span className="w-3 h-3 bg-blue-500 rounded"></span> class
+              </li>
+              <li className="flex items-center gap-2">
+                <span className="w-3 h-3 bg-purple-500 rounded"></span> function
+              </li>
+              <li className="flex items-center gap-2">
+                <span className="w-3 h-3 bg-yellow-500 rounded"></span> method
+              </li>
+            </ul>
+          </div>
+          <div className="bg-slate-700/50 rounded-lg p-4">
+            <h3 className="font-semibold text-purple-400 mb-2">Edge Types (Relations)</h3>
+            <ul className="text-sm text-slate-300 space-y-1">
+              <li><span className="text-green-400">defines</span> - file → class/function</li>
+              <li><span className="text-blue-400">has_method</span> - class → method</li>
+              <li><span className="text-purple-400">calls</span> - function → function</li>
+              <li><span className="text-yellow-400">imports</span> - file → module</li>
+              <li><span className="text-red-400">inherits_from</span> - class → class</li>
+            </ul>
+          </div>
+        </div>
+        <div className="bg-slate-900 rounded-lg p-4 overflow-x-auto">
+          <h3 className="font-semibold text-white mb-2">Example: Parsing Python Code</h3>
+          <pre className="text-sm text-slate-300">
+{`# Source Code
+class UserService:
+    def get_user(self, user_id):
+        return self.db.find(user_id)  # calls db.find
+# Generated Graph
+(file: user_service.py)
+    │
+    └──defines──▶ (class: UserService)
+                      │
+                      └──has_method──▶ (method: get_user)
+                                           │
+                                           └──calls──▶ (function: db.find)`}
+          </pre>
+        </div>
+      </div>
+      <div className="bg-slate-800 rounded-xl p-6">
+        <h3 className="font-semibold text-white mb-3">Call Graph Tools</h3>
+        <div className="space-y-3">
+          <div className="bg-slate-700 rounded p-3">
+            <code className="text-green-400">find_callers("authenticate")</code>
+            <p className="text-slate-400 text-sm mt-1">→ Returns all functions that call authenticate()</p>
+          </div>
+          <div className="bg-slate-700 rounded p-3">
+            <code className="text-blue-400">find_callees("process_request")</code>
+            <p className="text-slate-400 text-sm mt-1">→ Returns all functions called by process_request()</p>
+          </div>
+          <div className="bg-slate-700 rounded p-3">
+            <code className="text-purple-400">find_call_chain("main", "save_to_db")</code>
+            <p className="text-slate-400 text-sm mt-1">→ Returns execution paths from main() to save_to_db()</p>
+          </div>
+        </div>
+      </div>
+    </div>
+  );
+  const renderChunking = () => (
+    <div className="space-y-6">
+      <div className="bg-slate-800 rounded-xl p-6">
+        <h2 className="text-xl font-bold text-white mb-4">Structural Code Chunking</h2>
+        <p className="text-slate-400 mb-4">
+          Unlike naive text splitting, this system uses <span className="text-green-400">tree-sitter</span> to
+          chunk code at semantic boundaries (functions, classes) while respecting token limits.
+        </p>
+        <div className="grid grid-cols-1 md:grid-cols-2 gap-4 mb-6">
+          <div className="bg-red-900/30 border border-red-500/30 rounded-lg p-4">
+            <h3 className="font-semibold text-red-400 mb-2">❌ Naive Text Chunking</h3>
+            <pre className="text-xs text-slate-300 bg-slate-900 p-2 rounded">
+{`def process_data():
+    data = load()
+    # ──────────────── CHUNK BREAK ────
+    result = transform(data)
+    return result  # Broken mid-function!`}
+            </pre>
+          </div>
+          <div className="bg-green-900/30 border border-green-500/30 rounded-lg p-4">
+            <h3 className="font-semibold text-green-400 mb-2">✓ Structural Chunking</h3>
+            <pre className="text-xs text-slate-300 bg-slate-900 p-2 rounded">
+{`# CHUNK 1 - Complete function
+def process_data():
+    data = load()
+    result = transform(data)
+    return result
+# CHUNK 2 - Complete function
+def another_func():
+    ...`}
+            </pre>
+          </div>
+        </div>
+        <div className="bg-slate-700/50 rounded-lg p-4">
+          <h3 className="font-semibold text-blue-400 mb-2">Chunking Algorithm</h3>
+          <ol className="text-sm text-slate-300 space-y-2">
+            <li className="flex items-start gap-2">
+              <span className="bg-blue-500 text-white w-5 h-5 rounded-full flex items-center justify-center text-xs">1</span>
+              <span>Parse file into AST using tree-sitter</span>
+            </li>
+            <li className="flex items-start gap-2">
+              <span className="bg-blue-500 text-white w-5 h-5 rounded-full flex items-center justify-center text-xs">2</span>
+              <span>Recursively visit nodes (functions, classes, etc.)</span>
+            </li>
+            <li className="flex items-start gap-2">
+              <span className="bg-blue-500 text-white w-5 h-5 rounded-full flex items-center justify-center text-xs">3</span>
+              <span>If node fits in max_tokens (800) → return as chunk</span>
+            </li>
+            <li className="flex items-start gap-2">
+              <span className="bg-blue-500 text-white w-5 h-5 rounded-full flex items-center justify-center text-xs">4</span>
+              <span>If too large → split into children, recurse</span>
+            </li>
+            <li className="flex items-start gap-2">
+              <span className="bg-blue-500 text-white w-5 h-5 rounded-full flex items-center justify-center text-xs">5</span>
+              <span>Merge neighboring small chunks to avoid fragments</span>
+            </li>
+          </ol>
+        </div>
+      </div>
+      <div className="bg-slate-800 rounded-xl p-6">
+        <h3 className="font-semibold text-white mb-3">Rich Chunk Metadata</h3>
+        <div className="bg-slate-900 rounded-lg p-4">
+          <pre className="text-sm text-slate-300">
+{`FileChunk {
+  file_path: "src/auth/login.py",
+  start_byte: 245,
+  end_byte: 892,
+  line_range: "L12-L45",
+  language: "python",
+  chunk_type: "function_definition",
+  name: "authenticate",
+  // Enhanced metadata
+  symbols_defined: ["authenticate", "verify_token"],
+  imports_used: ["from jwt import decode"],
+  complexity_score: 7,  // Cyclomatic complexity
+  parent_context: "AuthService"  // Parent class
+}`}
+          </pre>
+        </div>
+      </div>
+    </div>
+  );
+  const renderAgent = () => (
+    <div className="space-y-6">
+      <div className="bg-slate-800 rounded-xl p-6">
+        <h2 className="text-xl font-bold text-white mb-4">Agentic Workflow (LangGraph)</h2>
+        <p className="text-slate-400 mb-4">
+          The agent can perform multi-step reasoning using tools, enabling complex analysis that
+          simple RAG cannot handle.
+        </p>
+        <div className="bg-slate-900 rounded-lg p-4 mb-6">
+          <h3 className="font-semibold text-purple-400 mb-3">Agent State Machine</h3>
+          <div className="flex flex-col items-center space-y-2">
+            <div className="bg-green-600/30 text-green-300 px-4 py-2 rounded-lg">START</div>
+            <ArrowRight className="w-4 h-4 text-slate-500 rotate-90" />
+            <div className="bg-blue-600/30 text-blue-300 px-6 py-3 rounded-lg text-center">
+              <div className="font-semibold">AGENT NODE</div>
+              <div className="text-xs mt-1">Process messages → Call LLM → Decide action</div>
+            </div>
+            <div className="flex items-center gap-4">
+              <div className="flex flex-col items-center">
+                <span className="text-xs text-slate-400">tool_call?</span>
+                <ArrowRight className="w-4 h-4 text-slate-500 rotate-90" />
+                <div className="bg-yellow-600/30 text-yellow-300 px-4 py-2 rounded-lg text-center">
+                  <div className="font-semibold">TOOLS NODE</div>
+                  <div className="text-xs">Execute tools</div>
+                </div>
+              </div>
+              <div className="flex flex-col items-center">
+                <span className="text-xs text-slate-400">final answer?</span>
+                <ArrowRight className="w-4 h-4 text-slate-500 rotate-90" />
+                <div className="bg-red-600/30 text-red-300 px-4 py-2 rounded-lg">END</div>
+              </div>
+            </div>
+          </div>
+        </div>
+        <div className="grid grid-cols-2 md:grid-cols-3 gap-3">
+          <div className="bg-slate-700 rounded-lg p-3">
+            <code className="text-green-400 text-sm">search_codebase</code>
+            <p className="text-xs text-slate-400 mt-1">Vector search in codebase</p>
+          </div>
+          <div className="bg-slate-700 rounded-lg p-3">
+            <code className="text-blue-400 text-sm">read_file</code>
+            <p className="text-xs text-slate-400 mt-1">Read complete file content</p>
+          </div>
+          <div className="bg-slate-700 rounded-lg p-3">
+            <code className="text-purple-400 text-sm">list_files</code>
+            <p className="text-xs text-slate-400 mt-1">Directory listing</p>
+          </div>
+          <div className="bg-slate-700 rounded-lg p-3">
+            <code className="text-yellow-400 text-sm">find_callers</code>
+            <p className="text-xs text-slate-400 mt-1">Who calls this function?</p>
+          </div>
+          <div className="bg-slate-700 rounded-lg p-3">
+            <code className="text-red-400 text-sm">find_callees</code>
+            <p className="text-xs text-slate-400 mt-1">What does this call?</p>
+          </div>
+          <div className="bg-slate-700 rounded-lg p-3">
+            <code className="text-pink-400 text-sm">find_call_chain</code>
+            <p className="text-xs text-slate-400 mt-1">Trace execution path</p>
+          </div>
+        </div>
+      </div>
+      <div className="bg-slate-800 rounded-xl p-6">
+        <h3 className="font-semibold text-white mb-3">Example Agent Execution</h3>
+        <div className="space-y-3 text-sm">
+          <div className="bg-slate-700 rounded p-3">
+            <span className="text-blue-400">User:</span>
+            <span className="text-slate-300 ml-2">"How does login handle invalid passwords?"</span>
+          </div>
+          <div className="bg-slate-700 rounded p-3">
+            <span className="text-purple-400">Agent Thought:</span>
+            <span className="text-slate-300 ml-2">I need to find the login function first.</span>
+            <div className="mt-1">
+              <span className="text-yellow-400">→ Tool Call:</span>
+              <code className="text-green-300 ml-2">search_codebase("login authentication")</code>
+            </div>
+          </div>
+          <div className="bg-slate-700 rounded p-3">
+            <span className="text-purple-400">Agent Thought:</span>
+            <span className="text-slate-300 ml-2">Found authenticate(). Let me see full implementation.</span>
+            <div className="mt-1">
+              <span className="text-yellow-400">→ Tool Call:</span>
+              <code className="text-green-300 ml-2">read_file("src/auth/login.py")</code>
+            </div>
+          </div>
+          <div className="bg-slate-700 rounded p-3">
+            <span className="text-purple-400">Agent Thought:</span>
+            <span className="text-slate-300 ml-2">It calls verify_password(). Let me check that.</span>
+            <div className="mt-1">
+              <span className="text-yellow-400">→ Tool Call:</span>
+              <code className="text-green-300 ml-2">find_callees("authenticate")</code>
+            </div>
+          </div>
+          <div className="bg-green-700/50 rounded p-3">
+            <span className="text-green-400">Final Answer:</span>
+            <span className="text-slate-300 ml-2">The login handles invalid passwords by...</span>
+          </div>
+        </div>
+      </div>
+    </div>
+  );
+  const renderRetrieval = () => (
+    <div className="space-y-6">
+      <div className="bg-slate-800 rounded-xl p-6">
+        <h2 className="text-xl font-bold text-white mb-4">Multi-Stage Retrieval System</h2>
+        <div className="space-y-4">
+          <div className="bg-green-900/30 border-l-4 border-green-500 rounded-r-lg p-4">
+            <h3 className="font-semibold text-green-400">Stage 1: Vector Retrieval (k=10)</h3>
+            <p className="text-slate-300 text-sm">Semantic similarity search in Chroma/FAISS using embeddings</p>
+          </div>
+          <div className="bg-blue-900/30 border-l-4 border-blue-500 rounded-r-lg p-4">
+            <h3 className="font-semibold text-blue-400">Stage 2: LLM File Selection</h3>
+            <p className="text-slate-300 text-sm">LLM analyzes file tree structure and selects relevant files</p>
+          </div>
+          <div className="bg-purple-900/30 border-l-4 border-purple-500 rounded-r-lg p-4">
+            <h3 className="font-semibold text-purple-400">Stage 3: Ensemble Combination</h3>
+            <p className="text-slate-300 text-sm">Weighted merge: 60% vector + 40% LLM selection</p>
+          </div>
+          <div className="bg-yellow-900/30 border-l-4 border-yellow-500 rounded-r-lg p-4">
+            <h3 className="font-semibold text-yellow-400">Stage 4: Graph Enhancement</h3>
+            <p className="text-slate-300 text-sm">Add related files from AST graph (imports, calls)</p>
+          </div>
+          <div className="bg-red-900/30 border-l-4 border-red-500 rounded-r-lg p-4">
+            <h3 className="font-semibold text-red-400">Stage 5: Cross-Encoder Reranking</h3>
+            <p className="text-slate-300 text-sm">Score each (query, doc) pair, return top 5</p>
+          </div>
+        </div>
+      </div>
+      <div className="bg-slate-800 rounded-xl p-6">
+        <h3 className="font-semibold text-white mb-3">Vector DB Support</h3>
+        <div className="grid grid-cols-3 gap-4">
+          <div className="bg-slate-700 rounded-lg p-4 text-center">
+            <Database className="w-8 h-8 text-green-400 mx-auto mb-2" />
+            <div className="font-semibold text-white">Chroma</div>
+            <div className="text-xs text-slate-400">Default, local</div>
+          </div>
+          <div className="bg-slate-700 rounded-lg p-4 text-center">
+            <Database className="w-8 h-8 text-blue-400 mx-auto mb-2" />
+            <div className="font-semibold text-white">FAISS</div>
+            <div className="text-xs text-slate-400">Fallback, fast</div>
+          </div>
+          <div className="bg-slate-700 rounded-lg p-4 text-center">
+            <Database className="w-8 h-8 text-purple-400 mx-auto mb-2" />
+            <div className="font-semibold text-white">Qdrant</div>
+            <div className="text-xs text-slate-400">Cloud option</div>
+          </div>
+        </div>
+      </div>
+    </div>
+  );
+  return (
+    <div className="min-h-screen bg-slate-900 text-white p-6">
+      <div className="max-w-6xl mx-auto">
+        <h1 className="text-3xl font-bold text-center mb-2">🕷️ Code Crawler Architecture</h1>
+        <p className="text-slate-400 text-center mb-6">Interactive System Documentation</p>
+        <div className="flex flex-wrap gap-2 mb-6 justify-center">
+          {tabs.map(tab => (
+            <button
+              key={tab.id}
+              onClick={() => setActiveTab(tab.id)}
+              className={`flex items-center gap-2 px-4 py-2 rounded-lg transition-all ${
+                activeTab === tab.id
+                  ? 'bg-purple-600 text-white'
+                  : 'bg-slate-800 text-slate-400 hover:bg-slate-700'
+              }`}
+            >
+              <tab.icon className="w-4 h-4" />
+              {tab.label}
+            </button>
+          ))}
+        </div>
+        <div className="transition-all">
+          {activeTab === 'overview' && renderOverview()}
+          {activeTab === 'rag' && renderRAG()}
+          {activeTab === 'ast' && renderAST()}
+          {activeTab === 'chunking' && renderChunking()}
+          {activeTab === 'agent' && renderAgent()}
+          {activeTab === 'retrieval' && renderRetrieval()}
+        </div>
+      </div>
+    </div>
+  );
+};
+export default ArchitectureViz;

code_chatbot/{agent_workflow.py → agents/agent_workflow.py} RENAMED Viewed

@@ -5,7 +5,7 @@ from langchain_core.messages import BaseMessage
 from langchain_core.tools import tool
 from langgraph.graph import StateGraph, END
 from langgraph.prebuilt import ToolNode
-from code_chatbot.rate_limiter import get_rate_limiter
 # Define State
 class AgentState(TypedDict):
@@ -49,7 +49,7 @@ def create_agent_graph(llm, retriever, repo_name: str = "Codebase", repo_dir: st
         return result
     # 2. Import File System Tools
-    from code_chatbot.tools import get_filesystem_tools, get_call_graph_tools
     # 3. Combine Tools
     fs_tools = get_filesystem_tools(repo_dir)

 from langchain_core.tools import tool
 from langgraph.graph import StateGraph, END
 from langgraph.prebuilt import ToolNode
+from code_chatbot.core.rate_limiter import get_rate_limiter
 # Define State
 class AgentState(TypedDict):
         return result
     # 2. Import File System Tools
+    from code_chatbot.agents.tools import get_filesystem_tools, get_call_graph_tools
     # 3. Combine Tools
     fs_tools = get_filesystem_tools(repo_dir)

code_chatbot/{crews → agents/crews}/__init__.py RENAMED Viewed

File without changes

code_chatbot/{tools.py → agents/tools.py} RENAMED Viewed

File without changes

code_chatbot/analysis/__init__.py ADDED Viewed

File without changes

code_chatbot/{ast_analysis.py → analysis/ast_analysis.py} RENAMED Viewed

File without changes

code_chatbot/{code_symbols.py → analysis/code_symbols.py} RENAMED Viewed

@@ -4,7 +4,7 @@ import logging
 from typing import List, Tuple, Optional
 from tree_sitter import Node
-from code_chatbot.chunker import StructuralChunker
 logger = logging.getLogger(__name__)

 from typing import List, Tuple, Optional
 from tree_sitter import Node
+from code_chatbot.ingestion.chunker import StructuralChunker
 logger = logging.getLogger(__name__)

code_chatbot/core/__init__.py ADDED Viewed

File without changes

code_chatbot/{config.py → core/config.py} RENAMED Viewed

File without changes

code_chatbot/{db_connection.py → core/db_connection.py} RENAMED Viewed

File without changes

code_chatbot/{path_obfuscator.py → core/path_obfuscator.py} RENAMED Viewed

File without changes

code_chatbot/{prompts.py → core/prompts.py} RENAMED Viewed

File without changes

code_chatbot/{rate_limiter.py → core/rate_limiter.py} RENAMED Viewed

File without changes

code_chatbot/ingestion/__init__.py ADDED Viewed

File without changes

code_chatbot/{chunker.py → ingestion/chunker.py} RENAMED Viewed

File without changes

code_chatbot/{incremental_indexing.py → ingestion/incremental_indexing.py} RENAMED Viewed

@@ -43,7 +43,7 @@ def add_incremental_indexing_methods(indexer_class):
         if not self.config.indexing.enable_incremental_indexing:
             logger.info("Incremental indexing disabled, performing full index")
             # Fall back to full indexing
-            from code_chatbot.universal_ingestor import UniversalIngestor
             ingestor = UniversalIngestor(source_path)
             ingestor.download()
@@ -138,7 +138,7 @@ def add_incremental_indexing_methods(indexer_class):
             collection_name: Name of the collection
             vector_db_type: Type of vector database
         """
-        from code_chatbot.indexer import get_chroma_client
         try:
             if vector_db_type == "chroma":
@@ -185,7 +185,7 @@ def add_incremental_indexing_methods(indexer_class):
         Returns:
             Dictionary with stats (total_chunks, unique_files, etc.)
         """
-        from code_chatbot.indexer import get_chroma_client
         try:
             chroma_client = get_chroma_client(self.persist_directory)

         if not self.config.indexing.enable_incremental_indexing:
             logger.info("Incremental indexing disabled, performing full index")
             # Fall back to full indexing
+            from code_chatbot.ingestion.universal_ingestor import UniversalIngestor
             ingestor = UniversalIngestor(source_path)
             ingestor.download()
             collection_name: Name of the collection
             vector_db_type: Type of vector database
         """
+        from code_chatbot.core.db_connection import get_chroma_client
         try:
             if vector_db_type == "chroma":
         Returns:
             Dictionary with stats (total_chunks, unique_files, etc.)
         """
+        from code_chatbot.core.db_connection import get_chroma_client
         try:
             chroma_client = get_chroma_client(self.persist_directory)

code_chatbot/{indexer.py → ingestion/indexer.py} RENAMED Viewed

@@ -4,16 +4,16 @@ from pathlib import Path
 from langchain_core.documents import Document
 from langchain_community.vectorstores import Chroma
 from langchain_google_genai import GoogleGenerativeAIEmbeddings
-from code_chatbot.chunker import StructuralChunker
-from code_chatbot.merkle_tree import MerkleTree, ChangeSet
-from code_chatbot.path_obfuscator import PathObfuscator
-from code_chatbot.config import get_config
 import shutil
 import logging
 logger = logging.getLogger(__name__)
-from code_chatbot.db_connection import (
     get_chroma_client,
     reset_chroma_clients,
     set_active_vector_db,
@@ -421,5 +421,5 @@ class Indexer:
                 raise
 # Add incremental indexing methods to the Indexer class
-from code_chatbot.incremental_indexing import add_incremental_indexing_methods
 Indexer = add_incremental_indexing_methods(Indexer)

 from langchain_core.documents import Document
 from langchain_community.vectorstores import Chroma
 from langchain_google_genai import GoogleGenerativeAIEmbeddings
+from code_chatbot.ingestion.chunker import StructuralChunker
+from code_chatbot.ingestion.merkle_tree import MerkleTree, ChangeSet
+from code_chatbot.core.path_obfuscator import PathObfuscator
+from code_chatbot.core.config import get_config
 import shutil
 import logging
 logger = logging.getLogger(__name__)
+from code_chatbot.core.db_connection import (
     get_chroma_client,
     reset_chroma_clients,
     set_active_vector_db,
                 raise
 # Add incremental indexing methods to the Indexer class
+from code_chatbot.ingestion.incremental_indexing import add_incremental_indexing_methods
 Indexer = add_incremental_indexing_methods(Indexer)

code_chatbot/{indexing_progress.py → ingestion/indexing_progress.py} RENAMED Viewed

@@ -27,12 +27,12 @@ def index_with_progress(
     Index a codebase with detailed progress tracking.
     Returns (chat_engine, success)
     """
-    from code_chatbot.universal_ingestor import process_source
-    from code_chatbot.ast_analysis import ASTGraphBuilder
-    from code_chatbot.indexer import Indexer
-    from code_chatbot.graph_rag import GraphEnhancedRetriever
-    from code_chatbot.rag import ChatEngine
-    from code_chatbot.chunker import StructuralChunker
     from langchain_community.vectorstores import Chroma, FAISS
     from langchain_community.vectorstores.utils import filter_complex_metadata
@@ -147,7 +147,7 @@ def index_with_progress(
             progress_bar.progress(1.0)
         else:  # Chroma
-            from code_chatbot.indexer import get_chroma_client, reset_chroma_clients
             # Reset client cache to avoid stale/corrupt connections
             reset_chroma_clients()

     Index a codebase with detailed progress tracking.
     Returns (chat_engine, success)
     """
+    from code_chatbot.ingestion.universal_ingestor import process_source
+    from code_chatbot.analysis.ast_analysis import ASTGraphBuilder
+    from code_chatbot.ingestion.indexer import Indexer
+    from code_chatbot.retrieval.graph_rag import GraphEnhancedRetriever
+    from code_chatbot.retrieval.rag import ChatEngine
+    from code_chatbot.ingestion.chunker import StructuralChunker
     from langchain_community.vectorstores import Chroma, FAISS
     from langchain_community.vectorstores.utils import filter_complex_metadata
             progress_bar.progress(1.0)
         else:  # Chroma
+            from code_chatbot.core.db_connection import get_chroma_client, reset_chroma_clients
             # Reset client cache to avoid stale/corrupt connections
             reset_chroma_clients()

code_chatbot/{merkle_tree.py → ingestion/merkle_tree.py} RENAMED Viewed

File without changes

code_chatbot/{universal_ingestor.py → ingestion/universal_ingestor.py} RENAMED Viewed

@@ -135,8 +135,46 @@ class UniversalIngestor(DataManager):
     def download(self) -> bool:
         """Downloads/prepares the data."""
-        return self.delegate.download()
     def walk(self, get_content: bool = True) -> Generator[Tuple[Any, Dict], None, None]:
         """Yields (content, metadata) tuples."""
         yield from self.delegate.walk(get_content)
@@ -177,7 +215,8 @@ class ZIPFileManager(DataManager):
         IGNORE_EXTENSIONS = {
             '.pyc', '.png', '.jpg', '.jpeg', '.gif', '.ico', '.svg', '.mp4', '.mov',
             '.zip', '.tar', '.gz', '.pdf', '.exe', '.bin', '.pkl', '.npy', '.pt', '.pth',
-            '.lock', '.log', '.sqlite3', '.db', '.min.js', '.min.css', '.map'
         }
         # Files to ignore by exact name (lock files, etc.)
         IGNORE_FILES = {
@@ -235,7 +274,8 @@ class LocalDirectoryManager(DataManager):
         IGNORE_EXTENSIONS = {
             '.pyc', '.png', '.jpg', '.jpeg', '.gif', '.ico', '.svg', '.mp4', '.mov',
             '.zip', '.tar', '.gz', '.pdf', '.exe', '.bin', '.pkl', '.npy', '.pt', '.pth',
-            '.lock', '.log', '.sqlite3', '.db', '.min.js', '.min.css', '.map'
         }
         # Files to ignore by exact name (lock files, etc.)
         IGNORE_FILES = {

     def download(self) -> bool:
         """Downloads/prepares the data."""
+        success = self.delegate.download()
+        if success:
+            self._clean_extracted_files()
+        return success
+    def _clean_extracted_files(self):
+        """Removes unnecessary files/directories from the extracted data."""
+        path = self.local_path
+        if not os.path.exists(path):
+            return
+        logger.info(f"Cleaning execution artifacts from {path}")
+        # Directories to remove completely
+        DIRS_TO_REMOVE = {'.git', '__pycache__', 'node_modules', '.ipynb_checkpoints', '.pytest_cache', '.dart_tool'}
+        # Files to remove
+        FILES_TO_REMOVE = {'.DS_Store', 'Thumbs.db', '.gitignore', '.gitattributes'}
+        for root, dirs, files in os.walk(path, topdown=False):
+            # Remove directories
+            for name in dirs:
+                if name in DIRS_TO_REMOVE:
+                    dir_path = os.path.join(root, name)
+                    try:
+                        shutil.rmtree(dir_path)
+                        logger.info(f"Removed directory: {dir_path}")
+                    except Exception as e:
+                        logger.warning(f"Failed to remove {dir_path}: {e}")
+            # Remove files
+            for name in files:
+                if name in FILES_TO_REMOVE:
+                    file_path = os.path.join(root, name)
+                    try:
+                        os.remove(file_path)
+                        logger.info(f"Removed file: {file_path}")
+                    except Exception as e:
+                        logger.warning(f"Failed to remove {file_path}: {e}")
     def walk(self, get_content: bool = True) -> Generator[Tuple[Any, Dict], None, None]:
         """Yields (content, metadata) tuples."""
         yield from self.delegate.walk(get_content)
         IGNORE_EXTENSIONS = {
             '.pyc', '.png', '.jpg', '.jpeg', '.gif', '.ico', '.svg', '.mp4', '.mov',
             '.zip', '.tar', '.gz', '.pdf', '.exe', '.bin', '.pkl', '.npy', '.pt', '.pth',
+            '.lock', '.log', '.sqlite3', '.db', '.min.js', '.min.css', '.map',
+            '.graphml', '.xml', '.toml'
         }
         # Files to ignore by exact name (lock files, etc.)
         IGNORE_FILES = {
         IGNORE_EXTENSIONS = {
             '.pyc', '.png', '.jpg', '.jpeg', '.gif', '.ico', '.svg', '.mp4', '.mov',
             '.zip', '.tar', '.gz', '.pdf', '.exe', '.bin', '.pkl', '.npy', '.pt', '.pth',
+            '.lock', '.log', '.sqlite3', '.db', '.min.js', '.min.css', '.map',
+            '.graphml', '.xml', '.toml'
         }
         # Files to ignore by exact name (lock files, etc.)
         IGNORE_FILES = {

code_chatbot/mcp/__init__.py ADDED Viewed

File without changes

code_chatbot/{mcp_client.py → mcp/mcp_client.py} RENAMED Viewed

@@ -6,7 +6,7 @@ Provides async methods to call MCP tools from other parts of the application.
 import logging
 from typing import List, Dict, Optional
-from code_chatbot.mcp_server import RefactorMCPServer, SearchResult, RefactorResult, RefactorSuggestion
 logger = logging.getLogger(__name__)

 import logging
 from typing import List, Dict, Optional
+from code_chatbot.mcp.mcp_server import RefactorMCPServer, SearchResult, RefactorResult, RefactorSuggestion
 logger = logging.getLogger(__name__)

code_chatbot/{mcp_server.py → mcp/mcp_server.py} RENAMED Viewed

File without changes

code_chatbot/retrieval/__init__.py ADDED Viewed

File without changes

code_chatbot/{graph_rag.py → retrieval/graph_rag.py} RENAMED Viewed

File without changes

code_chatbot/{llm_retriever.py → retrieval/llm_retriever.py} RENAMED Viewed

File without changes

code_chatbot/{rag.py → retrieval/rag.py} RENAMED Viewed

@@ -7,8 +7,8 @@ from langchain_core.messages import HumanMessage, SystemMessage, AIMessage
 from langchain_core.retrievers import BaseRetriever
 # Simplified implementation that works with current langchain version
 # We'll implement history-aware retrieval manually
-from code_chatbot.reranker import Reranker
-from code_chatbot.retriever_wrapper import build_enhanced_retriever
 import os
 # Configure logging
@@ -77,7 +77,7 @@ class ChatEngine:
         self.llm_retriever = None
         if self.repo_files:
             try:
-                from code_chatbot.llm_retriever import LLMRetriever
                 from langchain.retrievers import EnsembleRetriever
                 logger.info(f"Initializing LLMRetriever with {len(self.repo_files)} files.")
@@ -103,8 +103,8 @@ class ChatEngine:
         self.code_analyzer = None
         if self.use_agent:
             try:
-                from code_chatbot.agent_workflow import create_agent_graph
-                from code_chatbot.ast_analysis import EnhancedCodeAnalyzer
                 import os
                 logger.info(f"Building Agentic Workflow Graph for {self.repo_dir}...")
@@ -239,7 +239,7 @@ class ChatEngine:
             # Rebuild agent if using agents
             if self.use_agent:
                 try:
-                    from code_chatbot.agent_workflow import create_agent_graph
                     self.agent_executor = create_agent_graph(
                         llm=self.llm,
                         retriever=self.vector_retriever,
@@ -288,7 +288,7 @@ class ChatEngine:
                 # Contextualize with history
                 # Use comprehensive system prompt for high-quality answers
-                from code_chatbot.prompts import get_prompt_for_provider
                 sys_content = get_prompt_for_provider("system_agent", self.provider).format(repo_name=self.repo_name)
                 system_msg = SystemMessage(content=sys_content)
@@ -320,13 +320,7 @@ class ChatEngine:
                         answer = raw_content
                     # CLEANING: Remove hallucinated source chips
-                    import re
-                    # Remove the specific div block structure
-                    answer = re.sub(r'<div class="source-chip">.*?</div>\s*</div>', '', answer, flags=re.DOTALL)
-                    # Remove standalone chips if any remain
-                    answer = re.sub(r'<div class="source-chip">.*?</div>', '', answer, flags=re.DOTALL)
-                    # Clean up leading whitespace/newlines left behind
-                    answer = answer.strip()
                     # Update history
                     self.chat_history.append(HumanMessage(content=question))
@@ -371,6 +365,21 @@ class ChatEngine:
             logger.error(f"Error during chat: {e}", exc_info=True)
             return f"Error: {str(e)}", []
     def _linear_chat(self, question: str) -> Tuple[str, List[dict]]:
         """Linear RAG fallback."""
         messages, sources, _ = self._prepare_chat_context(question)
@@ -381,7 +390,7 @@ class ChatEngine:
         # Get response from LLM
         try:
             response_msg = self.llm.invoke(messages)
-            answer = response_msg.content
         except Exception as e:
             # Check for Rate Limit in Linear Chat
             error_str = str(e)
@@ -460,7 +469,7 @@ class ChatEngine:
             })
         # Build prompt with history - use provider-specific prompt
-        from code_chatbot.prompts import get_prompt_for_provider
         base_prompt = get_prompt_for_provider("linear_rag", self.provider)
         qa_system_prompt = base_prompt.format(
             repo_name=self.repo_name,

 from langchain_core.retrievers import BaseRetriever
 # Simplified implementation that works with current langchain version
 # We'll implement history-aware retrieval manually
+from code_chatbot.retrieval.reranker import Reranker
+from code_chatbot.retrieval.retriever_wrapper import build_enhanced_retriever
 import os
 # Configure logging
         self.llm_retriever = None
         if self.repo_files:
             try:
+                from code_chatbot.retrieval.llm_retriever import LLMRetriever
                 from langchain.retrievers import EnsembleRetriever
                 logger.info(f"Initializing LLMRetriever with {len(self.repo_files)} files.")
         self.code_analyzer = None
         if self.use_agent:
             try:
+                from code_chatbot.agents.agent_workflow import create_agent_graph
+                from code_chatbot.analysis.ast_analysis import EnhancedCodeAnalyzer
                 import os
                 logger.info(f"Building Agentic Workflow Graph for {self.repo_dir}...")
             # Rebuild agent if using agents
             if self.use_agent:
                 try:
+                    from code_chatbot.agents.agent_workflow import create_agent_graph
                     self.agent_executor = create_agent_graph(
                         llm=self.llm,
                         retriever=self.vector_retriever,
                 # Contextualize with history
                 # Use comprehensive system prompt for high-quality answers
+                from code_chatbot.core.prompts import get_prompt_for_provider
                 sys_content = get_prompt_for_provider("system_agent", self.provider).format(repo_name=self.repo_name)
                 system_msg = SystemMessage(content=sys_content)
                         answer = raw_content
                     # CLEANING: Remove hallucinated source chips
+                    answer = self._clean_response(answer)
                     # Update history
                     self.chat_history.append(HumanMessage(content=question))
             logger.error(f"Error during chat: {e}", exc_info=True)
             return f"Error: {str(e)}", []
+    def _clean_response(self, text: str) -> str:
+        """Clean response from hallucinated HTML/CSS artifacts."""
+        if not text:
+            return ""
+        import re
+        # Remove the specific div block structure for source chips
+        clean_text = re.sub(r'<div class="source-chip">.*?</div>\s*</div>', '', text, flags=re.DOTALL)
+        # Remove standalone chips if any remain
+        clean_text = re.sub(r'<div class="source-chip">.*?</div>', '', clean_text, flags=re.DOTALL)
+        # Remove source-container divs
+        clean_text = re.sub(r'<div class="source-container">.*?</div>', '', clean_text, flags=re.DOTALL)
+        return clean_text.strip()
     def _linear_chat(self, question: str) -> Tuple[str, List[dict]]:
         """Linear RAG fallback."""
         messages, sources, _ = self._prepare_chat_context(question)
         # Get response from LLM
         try:
             response_msg = self.llm.invoke(messages)
+            answer = self._clean_response(response_msg.content)
         except Exception as e:
             # Check for Rate Limit in Linear Chat
             error_str = str(e)
             })
         # Build prompt with history - use provider-specific prompt
+        from code_chatbot.core.prompts import get_prompt_for_provider
         base_prompt = get_prompt_for_provider("linear_rag", self.provider)
         qa_system_prompt = base_prompt.format(
             repo_name=self.repo_name,

code_chatbot/{reranker.py → retrieval/reranker.py} RENAMED Viewed

File without changes

code_chatbot/{retriever_wrapper.py → retrieval/retriever_wrapper.py} RENAMED Viewed

@@ -5,7 +5,7 @@ from typing import List, Optional, Any
 from langchain_core.retrievers import BaseRetriever
 from langchain_core.documents import Document
 from langchain_core.callbacks import CallbackManagerForRetrieverRun
-from code_chatbot.reranker import Reranker
 # Try to import MultiQueryRetriever - may not be available in all versions
 try:

 from langchain_core.retrievers import BaseRetriever
 from langchain_core.documents import Document
 from langchain_core.callbacks import CallbackManagerForRetrieverRun
+from code_chatbot.retrieval.reranker import Reranker
 # Try to import MultiQueryRetriever - may not be available in all versions
 try:

components/file_explorer.py CHANGED Viewed

@@ -80,7 +80,7 @@ def render_tree_items(tree: Dict, depth: int):
     for name, node in sorted_items:
         is_file = node.get("_type") == "file"
-        indent = "│  " * depth
         if is_file:
             # File item

     for name, node in sorted_items:
         is_file = node.get("_type") == "file"
+        indent = "│ " * depth # Compact indent for sidebar
         if is_file:
             # File item

components/multi_mode.py CHANGED Viewed

@@ -135,7 +135,7 @@ def render_search_mode():
         with st.spinner("Searching codebase..."):
             try:
-                from code_chatbot.mcp_client import MCPClient
                 client = MCPClient(workspace_root=workspace)
                 results = client.search_code(
@@ -222,7 +222,7 @@ def render_refactor_mode():
             with st.spinner("Processing refactoring..."):
                 try:
-                    from code_chatbot.mcp_client import MCPClient
                     client = MCPClient(workspace_root=workspace)
                     result = client.refactor_code(
@@ -298,7 +298,7 @@ def render_refactor_mode():
         if st.button("Apply Refactoring", type="primary", use_container_width=True):
             with st.spinner("Processing..."):
                 try:
-                    from code_chatbot.mcp_client import MCPClient
                     client = MCPClient(workspace_root=workspace)
                     result = client.refactor_code(

         with st.spinner("Searching codebase..."):
             try:
+                from code_chatbot.mcp.mcp_client import MCPClient
                 client = MCPClient(workspace_root=workspace)
                 results = client.search_code(
             with st.spinner("Processing refactoring..."):
                 try:
+                    from code_chatbot.mcp.mcp_client import MCPClient
                     client = MCPClient(workspace_root=workspace)
                     result = client.refactor_code(
         if st.button("Apply Refactoring", type="primary", use_container_width=True):
             with st.spinner("Processing..."):
                 try:
+                    from code_chatbot.mcp.mcp_client import MCPClient
                     client = MCPClient(workspace_root=workspace)
                     result = client.refactor_code(

components/sidebar.py CHANGED Viewed

@@ -117,7 +117,7 @@ def render_sidebar():
             # Show usage statistics if available
             if st.session_state.chat_engine:
                 try:
-                    from code_chatbot.rate_limiter import get_rate_limiter
                     limiter = get_rate_limiter(provider)
                     stats = limiter.get_usage_stats()

             # Show usage statistics if available
             if st.session_state.chat_engine:
                 try:
+                    from code_chatbot.core.rate_limiter import get_rate_limiter
                     limiter = get_rate_limiter(provider)
                     stats = limiter.get_usage_stats()

pages/1_⚡_Code_Studio.py CHANGED Viewed

@@ -12,92 +12,101 @@ st.set_page_config(
     page_title="Code Studio",
     page_icon="⚡",
     layout="wide",
-    initial_sidebar_state="collapsed" # Hide standard sidebar
 )
 apply_custom_css()
 # --- State Management ---
-if "active_tab" not in st.session_state:
-    st.session_state.active_tab = "explorer"
 if "processed_files" not in st.session_state or not st.session_state.processed_files:
-    # If accessed directly without processing, redirect home
     st.warning("⚠️ Please index a codebase first.")
     if st.button("Go Home"):
         st.switch_page("app.py")
     st.stop()
-# --- Layout ---
-# We use a 2-column layout: Side Panel (with Tabs) | Main Editor
-# Ratio: 3 : 7 - Narrower side panel for better proportion
-col_panel, col_editor = st.columns([3, 7])
-# --- Side Panel (Tabs) ---
-with col_panel:
-    # Use native Streamlit tabs for horizontal navigation
-    tab_explorer, tab_search, tab_chat, tab_generate = st.tabs(["📁 Explorer", "🔍 Search", "💬 Chat", "✨ Generate"])
-    with tab_explorer:
-        st.markdown("### 📁 Project Files")
-        render_file_tree(
-            st.session_state.get("indexed_files", []),
-            st.session_state.get("workspace_root", "")
-        )
-        st.divider()
-        if st.button("🏠 Index New Codebase", use_container_width=True):
-            st.session_state.processed_files = False
-            st.session_state.chat_engine = None
-            st.session_state.indexed_files = None
-            st.session_state.workspace_root = None
-            st.session_state.selected_file = None
-            st.switch_page("app.py")
-    with tab_search:
-        render_search_panel(st.session_state.get("indexed_files", []))
     with tab_chat:
         chat_engine = st.session_state.get("chat_engine")
         if chat_engine:
             render_chat_panel(chat_engine)
         else:
-            st.error("Chat engine unavailable. Please index a codebase first.")
-    with tab_generate:
         chat_engine = st.session_state.get("chat_engine")
         if chat_engine:
             render_generate_panel(chat_engine, st.session_state.get("indexed_files", []))
-        else:
-            st.error("Chat engine unavailable.")
-# --- Main Editor ---
-with col_editor:
-    # If a file is selected, show it. Otherwise show welcome/empty state.
-    selected_file = st.session_state.get("selected_file")
-    if selected_file:
-        # We use a container to ensure height consistency
-        with st.container():
-            # Alignment Spacer: Matches the height of st.tabs headers (~50px)
-            st.markdown("<div style='height: 50px;'></div>", unsafe_allow_html=True)
-            # Breadcrumbs / File Header
-            filename = os.path.basename(selected_file)
-            st.caption(f"Editing: {filename}")
-            # Code Viewer
             render_code_viewer_simple(selected_file)
-    else:
-        # Empty State
-        st.markdown(
-            """
-            <div style="display: flex; flex-direction: column; align-items: center; justify-content: center; height: 60vh; opacity: 0.5;">
-                <h1>⚡ Code Studio</h1>
-                <p>Select a file from the explorer to view context.</p>
-                <p>Use the tabs on the left to switch between tools.</p>
-            </div>
-            """,
-            unsafe_allow_html=True
-        )

     page_title="Code Studio",
     page_icon="⚡",
     layout="wide",
+    initial_sidebar_state="expanded"
 )
 apply_custom_css()
 # --- State Management ---
 if "processed_files" not in st.session_state or not st.session_state.processed_files:
     st.warning("⚠️ Please index a codebase first.")
     if st.button("Go Home"):
         st.switch_page("app.py")
     st.stop()
+# --- Sidebar: Navigation & Explorer ---
+with st.sidebar:
+    # 1. View Settings
+    st.header("⚙️ View")
+    layout_mode = st.radio("Layout Mode", ["Tabs (Full Width)", "Split View"], horizontal=True)
+    st.divider()
+    # 2. File Explorer
+    render_file_tree(
+        st.session_state.get("indexed_files", []),
+        st.session_state.get("workspace_root", "")
+    )
+    st.divider()
+    # 3. Actions
+    if st.button("🏠 New Codebase", use_container_width=True):
+        st.session_state.processed_files = False
+        st.session_state.chat_engine = None
+        st.session_state.indexed_files = None
+        st.session_state.workspace_root = None
+        st.session_state.selected_file = None
+        st.switch_page("app.py")
+# --- Main Workspace ---
+if layout_mode == "Tabs (Full Width)":
+    # TABBED LAYOUT (Default)
+    tab_chat, tab_code, tab_agent, tab_search = st.tabs(["💬 Chat", "📝 Code Editor", "✨ Agent", "🔍 Search"])
     with tab_chat:
         chat_engine = st.session_state.get("chat_engine")
         if chat_engine:
             render_chat_panel(chat_engine)
         else:
+            st.error("Chat engine unavailable.")
+    with tab_code:
+        selected_file = st.session_state.get("selected_file")
+        if selected_file:
+            filename = os.path.basename(selected_file)
+            st.caption(f"Editing: {filename}")
+            render_code_viewer_simple(selected_file)
+        else:
+            st.info("👈 Select a file from the sidebar to view code.")
+    with tab_agent:
         chat_engine = st.session_state.get("chat_engine")
         if chat_engine:
             render_generate_panel(chat_engine, st.session_state.get("indexed_files", []))
+    with tab_search:
+        render_search_panel(st.session_state.get("indexed_files", []))
+else:
+    # SPLIT VIEW (Legacy)
+    split_ratio = st.slider("Panel Width (%)", 20, 80, 40, step=5)
+    panel_width = split_ratio / 100.0
+    editor_width = 1.0 - panel_width
+    col_panel, col_editor = st.columns([panel_width, editor_width])
+    with col_panel:
+        tab_sub_chat, tab_sub_search, tab_sub_agent = st.tabs(["💬 Chat", "🔍 Search", "✨ Agent"])
+        with tab_sub_chat:
+            chat_engine = st.session_state.get("chat_engine")
+            if chat_engine:
+                render_chat_panel(chat_engine)
+        with tab_sub_search:
+            render_search_panel(st.session_state.get("indexed_files", []))
+        with tab_sub_agent:
+            chat_engine = st.session_state.get("chat_engine")
+            if chat_engine:
+                render_generate_panel(chat_engine, st.session_state.get("indexed_files", []))
+    with col_editor:
+        selected_file = st.session_state.get("selected_file")
+        if selected_file:
+            st.caption(f"Editing: {os.path.basename(selected_file)}")
             render_code_viewer_simple(selected_file)
+        else:
+            st.info("👈 Select a file from the sidebar.")

pages/1_⚡_Code_Studio.py.bak ADDED Viewed

	@@ -0,0 +1,118 @@

+"""
+⚡ Code Studio - The Main IDE Interface
+"""
+import streamlit as st
+import os
+from components.style import apply_custom_css
+from components.file_explorer import render_file_tree
+from components.code_viewer import render_code_viewer_simple
+from components.panels import render_chat_panel, render_search_panel, render_generate_panel
+st.set_page_config(
+    page_title="Code Studio",
+    page_icon="⚡",
+    layout="wide",
+    initial_sidebar_state="collapsed" # Hide standard sidebar
+)
+apply_custom_css()
+# --- State Management ---
+if "active_tab" not in st.session_state:
+    st.session_state.active_tab = "explorer"
+if "processed_files" not in st.session_state or not st.session_state.processed_files:
+    # If accessed directly without processing, redirect home
+    st.warning("⚠️ Please index a codebase first.")
+    if st.button("Go Home"):
+        st.switch_page("app.py")
+    st.stop()
+# --- Layout Configuration ---
+# Allow user to resize the split
+with st.sidebar:
+    st.header("⚙️ Layout Settings")
+    split_ratio = st.slider(
+        "Panel Width (%)",
+        min_value=20,
+        max_value=80,
+        value=30,
+        step=5,
+        help="Adjust the width of the left panel (Chat/Explorer)."
+    )
+# Calculate column ratios based on percentage
+panel_width = split_ratio / 100.0
+editor_width = 1.0 - panel_width
+# Main Layout
+col_panel, col_editor = st.columns([panel_width, editor_width])
+# --- Side Panel (Tabs) ---
+with col_panel:
+    # Use native Streamlit tabs for horizontal navigation
+    tab_explorer, tab_search, tab_chat, tab_generate = st.tabs(["📁 Explorer", "🔍 Search", "💬 Chat", "✨ Generate"])
+    with tab_explorer:
+        st.markdown("### 📁 Project Files")
+        render_file_tree(
+            st.session_state.get("indexed_files", []),
+            st.session_state.get("workspace_root", "")
+        )
+        st.divider()
+        if st.button("🏠 Index New Codebase", use_container_width=True):
+            st.session_state.processed_files = False
+            st.session_state.chat_engine = None
+            st.session_state.indexed_files = None
+            st.session_state.workspace_root = None
+            st.session_state.selected_file = None
+            st.switch_page("app.py")
+    with tab_search:
+        render_search_panel(st.session_state.get("indexed_files", []))
+    with tab_chat:
+        chat_engine = st.session_state.get("chat_engine")
+        if chat_engine:
+            render_chat_panel(chat_engine)
+        else:
+            st.error("Chat engine unavailable. Please index a codebase first.")
+    with tab_generate:
+        chat_engine = st.session_state.get("chat_engine")
+        if chat_engine:
+            render_generate_panel(chat_engine, st.session_state.get("indexed_files", []))
+        else:
+            st.error("Chat engine unavailable.")
+# --- Main Editor ---
+with col_editor:
+    # If a file is selected, show it. Otherwise show welcome/empty state.
+    selected_file = st.session_state.get("selected_file")
+    if selected_file:
+        # We use a container to ensure height consistency
+        with st.container():
+            # Alignment Spacer: Matches the height of st.tabs headers (~50px)
+            st.markdown("<div style='height: 50px;'></div>", unsafe_allow_html=True)
+            # Breadcrumbs / File Header
+            filename = os.path.basename(selected_file)
+            st.caption(f"Editing: {filename}")
+            # Code Viewer
+            render_code_viewer_simple(selected_file)
+    else:
+        # Empty State
+        st.markdown(
+            """
+            <div style="display: flex; flex-direction: column; align-items: center; justify-content: center; height: 60vh; opacity: 0.5;">
+                <h1>⚡ Code Studio</h1>
+                <p>Select a file from the explorer to view context.</p>
+                <p>Use the tabs on the left to switch between tools.</p>
+            </div>
+            """,
+            unsafe_allow_html=True
+        )

tests/test_merkle_tree_simple.py CHANGED Viewed

@@ -2,7 +2,7 @@
 Test script for Merkle tree change detection.
 """
-from code_chatbot.merkle_tree import MerkleTree
 from pathlib import Path
 import tempfile
 import shutil

 Test script for Merkle tree change detection.
 """
+from code_chatbot.ingestion.merkle_tree import MerkleTree
 from pathlib import Path
 import tempfile
 import shutil