Spaces:
Paused
Paused
Production Grade RAG β Developer Documentation Assistant
Implementation Plan
Project Overview
A domain-specific "Ask my Docs" system built for developer documentation. Users can query any SDK/API/framework documentation and receive accurate answers with citations.
Initial Target Domain: FastAPI Documentation
Locked MVP Choices
The MVP is intentionally constrained to one implementation path. Any alternatives listed in earlier notes are out of scope until the baseline system is working and evaluated.
| Area | MVP Choice |
|---|---|
| Documentation source | Official FastAPI documentation pages only |
| Ingestion method | BeautifulSoup crawler + local normalized markdown/json output |
| Extra sources | No GitHub /docs sync, no PDF upload, no arbitrary .md ingestion in MVP |
| Chunking | Structure-aware chunking with LangChain RecursiveCharacterTextSplitter |
| Embeddings | OpenAI text-embedding-3-small |
| Vector store | ChromaDB only |
| Keyword retrieval | rank_bm25 |
| Fusion | Reciprocal Rank Fusion (RRF) |
| Re-ranker | Local cross-encoder/ms-marco-MiniLM-L-6-v2 |
| Answer model | OpenAI gpt-4o-mini |
| API layer | FastAPI |
| Evaluation | Ragas + a hand-written FastAPI eval set |
| CI gate | Lightweight retrieval/regression checks on PRs; full eval run on demand or nightly |
MVP Scope Boundaries
- Index only the official FastAPI docs corpus.
- Treat the MVP as a single-tenant local/dev system.
- Optimize for answer quality and citation correctness before UI polish.
- Defer multi-doc search, version-aware retrieval, PDF ingestion, and production vector DB migration until after MVP validation.
Phase 1: Fundamentals (Data Ingestion Pipeline)
Step 1 β Document Collection
- Crawl official FastAPI documentation using
BeautifulSoup - Normalize each page into local markdown/json before chunking
- Store raw HTML plus cleaned content locally for reproducibility
- Exclude GitHub docs sync, PDF upload, and arbitrary file ingestion from MVP
- Acceptance criteria:
- All target FastAPI pages are fetched successfully
- Each page is stored with
source_url,page_title, and crawl timestamp - Re-running ingestion updates changed pages without duplicating records
Step 2 β Chunking Strategy
- Use LangChain
RecursiveCharacterTextSplitter - Chunk size target: 700 tokens
- Overlap: 100 tokens
- Preserve code blocks and heading boundaries whenever possible
- Tag each chunk with metadata:
source_urlpage_titlesection_titlechunk_iddoc_version
- Acceptance criteria:
- No chunk splits in the middle of fenced code blocks
- Every chunk can be traced back to an exact source page and section
- Chunk output is deterministic for unchanged source documents
Step 3 β Embeddings + Vector Store
- Embedding model: OpenAI
text-embedding-3-small - Store embeddings in ChromaDB
- Each vector carries full metadata from Step 2
- Acceptance criteria:
- Full FastAPI corpus is embedded successfully
- Re-indexing does not create duplicate vectors for unchanged chunks
- Querying Chroma returns chunk metadata required for citation display
Phase 2: Hybrid Retrieval
Step 4 β BM25 + Semantic Search
- BM25 via
rank_bm25library for keyword matching - Vector similarity search via ChromaDB
- Merge both result sets using Reciprocal Rank Fusion (RRF)
- Retrieve top 10 BM25 hits and top 10 vector hits before fusion
- Acceptance criteria:
- Known keyword-heavy queries surface exact-term matches
- Known semantic queries surface relevant conceptual matches
- Fused retrieval performs better than vector-only on the seed eval set
Step 5 β Cross Encoder Re-ranker
- Use local
cross-encoder/ms-marco-MiniLM-L-6-v2 - Re-rank top 20 results β pass top 5 to LLM
- Biggest quality boost in the pipeline
- Acceptance criteria:
- Re-ranking is applied on every answer path
- Top 5 contexts are logged for debugging and eval review
- Reranked results improve answer grounding on the seed eval set
Step 6 β Citation Enforcement
- Each retrieved chunk carries
source_url+section_title - Prompt the LLM to answer only using provided context
- Force structured output: answer + list of cited chunks
- If no relevant chunk found β return "I don't know" (no hallucination)
- Validate that every cited chunk ID exists in the retrieved context set
- Acceptance criteria:
- API response returns answer text plus machine-readable citations
- Unsupported answers are rejected or converted to "I don't know"
- Manual spot checks confirm citations map to relevant evidence
Phase 3: Evaluation Pipeline
Step 7 β Build Eval Dataset
- Manually write 30-50 Q&A pairs from FastAPI docs for the initial eval set
- Cover: factual questions, code questions, comparison questions
- Store as JSON/CSV in the repo
- Expand to 100+ only after the first reliable baseline is in place
- Acceptance criteria:
- Eval set covers multiple sections of the docs
- Each question has an expected answer and supporting source reference
- The dataset is stable enough to compare runs over time
Step 8 β Ragas Evaluation Metrics
| Metric | What it measures |
|---|---|
faithfulness |
Does the answer match retrieved context? |
answer_relevancy |
Is the answer on-topic? |
context_precision |
Are the right chunks being retrieved? |
context_recall |
Are all relevant chunks being found? |
Step 9 β CI Integration
- Run lightweight regression checks on every PR via GitHub Actions
- Reserve full Ragas evaluation for scheduled or manually triggered runs
- Use an initial target such as
faithfulness >= 0.85as a baseline, not a permanent hard-coded gate - Store eval results over time to track regression
- Acceptance criteria:
- PR workflow catches obvious retrieval or API regressions quickly
- Full eval workflow produces repeatable metrics artifacts
- Baseline metrics are documented before strict CI thresholds are enforced
Tech Stack
| Layer | Tool |
|---|---|
| Orchestration | LangChain |
| Vector Store (dev) | ChromaDB |
| Vector Store (prod) | Deferred until post-MVP |
| Reranker | cross-encoder/ms-marco-MiniLM-L-6-v2 |
| Evaluation | Ragas |
| LLM | gpt-4o-mini |
| Scraping | BeautifulSoup |
| CI/CD | GitHub Actions |
| API Layer | FastAPI |
| Frontend (optional) | Deferred until post-MVP |
Suggested Build Timeline
| Week | Focus |
|---|---|
| Week 1 | Phase 1 β Ingest, chunk, embed, store |
| Week 2 | Phase 2 β BM25 + rerank + citation enforcement |
| Week 3 | Phase 3 β Eval dataset + Ragas + CI pipeline |
| Week 4 | API hardening + README + baseline evaluation review |
Bonus Features (Post-MVP)
- Version-aware retrieval β "In v2 vs v3, how does X work?"
- Code snippet extraction β return relevant code blocks alongside answers
- Multi-doc search β compare two frameworks side by side
- Query rewriting β auto-expand vague queries before retrieval
Based on project notes from January 10β11, 2026