Spaces:
Paused
Paused
| # Production Grade RAG β Developer Documentation Assistant | |
| ## Implementation Plan | |
| --- | |
| ## Project Overview | |
| A domain-specific "Ask my Docs" system built for developer documentation. | |
| Users can query any SDK/API/framework documentation and receive accurate answers with citations. | |
| **Initial Target Domain:** FastAPI Documentation | |
| ## Locked MVP Choices | |
| The MVP is intentionally constrained to one implementation path. Any alternatives listed in earlier notes are out of scope until the baseline system is working and evaluated. | |
| | Area | MVP Choice | | |
| |---|---| | |
| | Documentation source | Official FastAPI documentation pages only | | |
| | Ingestion method | `BeautifulSoup` crawler + local normalized markdown/json output | | |
| | Extra sources | No GitHub `/docs` sync, no PDF upload, no arbitrary `.md` ingestion in MVP | | |
| | Chunking | Structure-aware chunking with LangChain `RecursiveCharacterTextSplitter` | | |
| | Embeddings | OpenAI `text-embedding-3-small` | | |
| | Vector store | `ChromaDB` only | | |
| | Keyword retrieval | `rank_bm25` | | |
| | Fusion | Reciprocal Rank Fusion (RRF) | | |
| | Re-ranker | Local `cross-encoder/ms-marco-MiniLM-L-6-v2` | | |
| | Answer model | OpenAI `gpt-4o-mini` | | |
| | API layer | `FastAPI` | | |
| | Evaluation | `Ragas` + a hand-written FastAPI eval set | | |
| | CI gate | Lightweight retrieval/regression checks on PRs; full eval run on demand or nightly | | |
| ## MVP Scope Boundaries | |
| - Index only the official FastAPI docs corpus. | |
| - Treat the MVP as a single-tenant local/dev system. | |
| - Optimize for answer quality and citation correctness before UI polish. | |
| - Defer multi-doc search, version-aware retrieval, PDF ingestion, and production vector DB migration until after MVP validation. | |
| --- | |
| ## Phase 1: Fundamentals (Data Ingestion Pipeline) | |
| ### Step 1 β Document Collection | |
| - Crawl official FastAPI documentation using `BeautifulSoup` | |
| - Normalize each page into local markdown/json before chunking | |
| - Store raw HTML plus cleaned content locally for reproducibility | |
| - Exclude GitHub docs sync, PDF upload, and arbitrary file ingestion from MVP | |
| - Acceptance criteria: | |
| - All target FastAPI pages are fetched successfully | |
| - Each page is stored with `source_url`, `page_title`, and crawl timestamp | |
| - Re-running ingestion updates changed pages without duplicating records | |
| ### Step 2 β Chunking Strategy | |
| - Use LangChain `RecursiveCharacterTextSplitter` | |
| - Chunk size target: **700 tokens** | |
| - Overlap: **100 tokens** | |
| - Preserve code blocks and heading boundaries whenever possible | |
| - Tag each chunk with metadata: | |
| - `source_url` | |
| - `page_title` | |
| - `section_title` | |
| - `chunk_id` | |
| - `doc_version` | |
| - Acceptance criteria: | |
| - No chunk splits in the middle of fenced code blocks | |
| - Every chunk can be traced back to an exact source page and section | |
| - Chunk output is deterministic for unchanged source documents | |
| ### Step 3 β Embeddings + Vector Store | |
| - Embedding model: OpenAI `text-embedding-3-small` | |
| - Store embeddings in **ChromaDB** | |
| - Each vector carries full metadata from Step 2 | |
| - Acceptance criteria: | |
| - Full FastAPI corpus is embedded successfully | |
| - Re-indexing does not create duplicate vectors for unchanged chunks | |
| - Querying Chroma returns chunk metadata required for citation display | |
| --- | |
| ## Phase 2: Hybrid Retrieval | |
| ### Step 4 β BM25 + Semantic Search | |
| - BM25 via `rank_bm25` library for keyword matching | |
| - Vector similarity search via ChromaDB | |
| - Merge both result sets using **Reciprocal Rank Fusion (RRF)** | |
| - Retrieve top 10 BM25 hits and top 10 vector hits before fusion | |
| - Acceptance criteria: | |
| - Known keyword-heavy queries surface exact-term matches | |
| - Known semantic queries surface relevant conceptual matches | |
| - Fused retrieval performs better than vector-only on the seed eval set | |
| ### Step 5 β Cross Encoder Re-ranker | |
| - Use local `cross-encoder/ms-marco-MiniLM-L-6-v2` | |
| - Re-rank top 20 results β pass top 5 to LLM | |
| - Biggest quality boost in the pipeline | |
| - Acceptance criteria: | |
| - Re-ranking is applied on every answer path | |
| - Top 5 contexts are logged for debugging and eval review | |
| - Reranked results improve answer grounding on the seed eval set | |
| ### Step 6 β Citation Enforcement | |
| - Each retrieved chunk carries `source_url` + `section_title` | |
| - Prompt the LLM to answer only using provided context | |
| - Force structured output: answer + list of cited chunks | |
| - If no relevant chunk found β return "I don't know" (no hallucination) | |
| - Validate that every cited chunk ID exists in the retrieved context set | |
| - Acceptance criteria: | |
| - API response returns answer text plus machine-readable citations | |
| - Unsupported answers are rejected or converted to "I don't know" | |
| - Manual spot checks confirm citations map to relevant evidence | |
| --- | |
| ## Phase 3: Evaluation Pipeline | |
| ### Step 7 β Build Eval Dataset | |
| - Manually write **30-50 Q&A pairs** from FastAPI docs for the initial eval set | |
| - Cover: factual questions, code questions, comparison questions | |
| - Store as JSON/CSV in the repo | |
| - Expand to 100+ only after the first reliable baseline is in place | |
| - Acceptance criteria: | |
| - Eval set covers multiple sections of the docs | |
| - Each question has an expected answer and supporting source reference | |
| - The dataset is stable enough to compare runs over time | |
| ### Step 8 β Ragas Evaluation Metrics | |
| | Metric | What it measures | | |
| |---|---| | |
| | `faithfulness` | Does the answer match retrieved context? | | |
| | `answer_relevancy` | Is the answer on-topic? | | |
| | `context_precision` | Are the right chunks being retrieved? | | |
| | `context_recall` | Are all relevant chunks being found? | | |
| ### Step 9 β CI Integration | |
| - Run lightweight regression checks on every PR via **GitHub Actions** | |
| - Reserve full Ragas evaluation for scheduled or manually triggered runs | |
| - Use an initial target such as `faithfulness >= 0.85` as a baseline, not a permanent hard-coded gate | |
| - Store eval results over time to track regression | |
| - Acceptance criteria: | |
| - PR workflow catches obvious retrieval or API regressions quickly | |
| - Full eval workflow produces repeatable metrics artifacts | |
| - Baseline metrics are documented before strict CI thresholds are enforced | |
| --- | |
| ## Tech Stack | |
| | Layer | Tool | | |
| |---|---| | |
| | Orchestration | LangChain | | |
| | Vector Store (dev) | ChromaDB | | |
| | Vector Store (prod) | Deferred until post-MVP | | |
| | Reranker | `cross-encoder/ms-marco-MiniLM-L-6-v2` | | |
| | Evaluation | Ragas | | |
| | LLM | `gpt-4o-mini` | | |
| | Scraping | BeautifulSoup | | |
| | CI/CD | GitHub Actions | | |
| | API Layer | FastAPI | | |
| | Frontend (optional) | Deferred until post-MVP | | |
| --- | |
| ## Suggested Build Timeline | |
| | Week | Focus | | |
| |---|---| | |
| | Week 1 | Phase 1 β Ingest, chunk, embed, store | | |
| | Week 2 | Phase 2 β BM25 + rerank + citation enforcement | | |
| | Week 3 | Phase 3 β Eval dataset + Ragas + CI pipeline | | |
| | Week 4 | API hardening + README + baseline evaluation review | | |
| --- | |
| ## Bonus Features (Post-MVP) | |
| - **Version-aware retrieval** β "In v2 vs v3, how does X work?" | |
| - **Code snippet extraction** β return relevant code blocks alongside answers | |
| - **Multi-doc search** β compare two frameworks side by side | |
| - **Query rewriting** β auto-expand vague queries before retrieval | |
| --- | |
| *Based on project notes from January 10β11, 2026* | |